[GitHub] spark issue #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendForAll by...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18624 But, I agree the issue @MLnick mentioned, the code now looks convoluted, can you try to simplify it ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19661: [SPARK-22450][Core][Mllib]safely register class f...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/19661#discussion_r150171482 --- Diff: core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala --- @@ -108,6 +108,27 @@ class KryoSerializerSuite extends SparkFunSuite with SharedSparkContext { check(Array(Array("1", "2"), Array("1", "2", "3", "4"))) } + test("safely register class for mllib/ml") { +val conf = new SparkConf(false) +val ser = new KryoSerializer(conf) + +Seq("org.apache.spark.mllib.linalg.Vector", + "org.apache.spark.mllib.linalg.DenseVector", + "org.apache.spark.mllib.linalg.SparseVector", + "org.apache.spark.mllib.linalg.Matrix", + "org.apache.spark.mllib.linalg.DenseMatrix", + "org.apache.spark.mllib.linalg.SparseMatrix", + "org.apache.spark.ml.linalg.Vector", + "org.apache.spark.ml.linalg.DenseVector", + "org.apache.spark.ml.linalg.SparseVector", + "org.apache.spark.ml.linalg.Matrix", + "org.apache.spark.ml.linalg.DenseMatrix", + "org.apache.spark.ml.linalg.SparseMatrix", + "org.apache.spark.ml.feature.Instance", + "org.apache.spark.ml.feature.OffsetInstance" +).foreach(!Utils.classIsLoadable(_)) --- End diff -- This UT looks doesn't actually reflect your purpose above, seems this always be passed. Also `conf` and `ser` above seems never used here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendFo...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18624#discussion_r150170451 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -286,40 +288,119 @@ object MatrixFactorizationModel extends Loader[MatrixFactorizationModel] { srcFeatures: RDD[(Int, Array[Double])], dstFeatures: RDD[(Int, Array[Double])], num: Int): RDD[(Int, Array[(Int, Double)])] = { -val srcBlocks = blockify(srcFeatures) -val dstBlocks = blockify(dstFeatures) -val ratings = srcBlocks.cartesian(dstBlocks).flatMap { case (srcIter, dstIter) => - val m = srcIter.size - val n = math.min(dstIter.size, num) - val output = new Array[(Int, (Int, Double))](m * n) +val srcBlocks = blockify(rank, srcFeatures).zipWithIndex() +val dstBlocks = blockify(rank, dstFeatures) +val ratings = srcBlocks.cartesian(dstBlocks).map { + case (((srcIds, srcFactors), index), (dstIds, dstFactors)) => +val m = srcIds.length +val n = dstIds.length +val dstIdMatrix = new Array[Int](m * num) +val scoreMatrix = Array.fill[Double](m * num)(Double.NegativeInfinity) +val pq = new BoundedPriorityQueue[(Int, Double)](num)(Ordering.by(_._2)) + +val ratings = srcFactors.transpose.multiply(dstFactors) +var i = 0 +var j = 0 +while (i < m) { + var k = 0 + while (k < n) { +pq += dstIds(k) -> ratings(i, k) +k += 1 + } + k = 0 + pq.toArray.sortBy(-_._2).foreach { case (id, score) => +dstIdMatrix(j + k) = id +scoreMatrix(j + k) = score +k += 1 + } + // pq.size maybe less than num, corner case + j += num + i += 1 + pq.clear() +} +(index, (srcIds, dstIdMatrix, new DenseMatrix(m, num, scoreMatrix, true))) +} +ratings.aggregateByKey(null: Array[Int], null: Array[Int], null: DenseMatrix)( + (rateSum, rate) => mergeFunc(rateSum, rate, num), + (rateSum1, rateSum2) => mergeFunc(rateSum1, rateSum2, num) +).flatMap { case (index, (srcIds, dstIdMatrix, scoreMatrix)) => + // to avoid corner case that the number of items is less than recommendation num + var col: Int = 0 + while (col < num && scoreMatrix(0, col) > Double.NegativeInfinity) { +col += 1 + } + val row = scoreMatrix.numRows + val output = new Array[(Int, Array[(Int, Double)])](row) var i = 0 - val pq = new BoundedPriorityQueue[(Int, Double)](n)(Ordering.by(_._2)) - srcIter.foreach { case (srcId, srcFactor) => -dstIter.foreach { case (dstId, dstFactor) => - // We use F2jBLAS which is faster than a call to native BLAS for vector dot product - val score = BLAS.f2jBLAS.ddot(rank, srcFactor, 1, dstFactor, 1) - pq += dstId -> score + while (i < row) { +val factors = new Array[(Int, Double)](col) +var j = 0 +while (j < col) { + factors(j) = (dstIdMatrix(i * num + j), scoreMatrix(i, j)) + j += 1 } -pq.foreach { case (dstId, score) => - output(i) = (srcId, (dstId, score)) - i += 1 +output(i) = (srcIds(i), factors) +i += 1 + } + output.toSeq} + } + + private def mergeFunc(rateSum: (Array[Int], Array[Int], DenseMatrix), +rate: (Array[Int], Array[Int], DenseMatrix), +num: Int): (Array[Int], Array[Int], DenseMatrix) = { +if (rateSum._1 == null) { + rate +} else { + val row = rateSum._3.numRows + var i = 0 + val tempIdMatrix = new Array[Int](row * num) + val tempScoreMatrix = Array.fill[Double](row * num)(Double.NegativeInfinity) + while (i < row) { +var j = 0 +var sum_index = 0 +var rate_index = 0 +val matrixIndex = i * num +while (j < num) { + if (rate._3(i, rate_index) > rateSum._3(i, sum_index)) { +tempIdMatrix(matrixIndex + j) = rate._2(matrixIndex + rate_index) +tempScoreMatrix(matrixIndex + j) = rate._3(i, rate_index) +rate_index += 1 + } else { +tempIdMatrix(matrixIndex + j) = rateSum._2(matrixIndex + sum_index) +tempScoreMatrix(matrixIndex + j) = rateSum._3(i, sum_index) +sum_index += 1 + } + j += 1 } -pq.clear() +i += 1
[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19257 Hi, All. Master branch still has this problem. Can we proceed this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19439 **[Test build #83672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83672/testReport)** for PR 19439 at commit [`04db0fd`](https://github.com/apache/spark/commit/04db0fd02ee1abacc65d20c8d12eab8b6539e09f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19439 **[Test build #83671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83671/testReport)** for PR 19439 at commit [`a6c82ce`](https://github.com/apache/spark/commit/a6c82ceb1752345a2379e8e26f66bbf91b579991). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19439 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83671/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19439 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19702: [SPARK-10365][SQL] Support Parquet logical type T...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19702#discussion_r150168485 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1143,6 +1159,18 @@ class SQLConf extends Serializable with Logging { def isParquetINT64AsTimestampMillis: Boolean = getConf(PARQUET_INT64_AS_TIMESTAMP_MILLIS) + def parquetOutputTimestampType: ParquetOutputTimestampType.Value = { +val isOutputTimestampTypeSet = settings.containsKey(PARQUET_OUTPUT_TIMESTAMP_TYPE.key) +if (!isOutputTimestampTypeSet && isParquetINT64AsTimestampMillis) { + // If PARQUET_OUTPUT_TIMESTAMP_TYPE is not set and PARQUET_INT64_AS_TIMESTAMP_MILLIS is set, + // respect PARQUET_INT64_AS_TIMESTAMP_MILLIS and use TIMESTAMP_MILLIS. Otherwise, + // PARQUET_OUTPUT_TIMESTAMP_TYPE has higher priority. --- End diff -- if `isParquetINT64AsTimestampMillis` is false, we will go to the else branch, and pick `PARQUET_OUTPUT_TIMESTAMP_TYPE`, which by default is INT96(the current behavior). Let me add a test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19661: [SPARK-22450][Core][Mllib]safely register class for mlli...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19661 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19439 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83670/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19439 **[Test build #83670 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83670/testReport)** for PR 19439 at commit [`c2a4e19`](https://github.com/apache/spark/commit/c2a4e197eec7749eb660b09a1fd6a7a27df32c39). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19439 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19439 **[Test build #83671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83671/testReport)** for PR 19439 at commit [`a6c82ce`](https://github.com/apache/spark/commit/a6c82ceb1752345a2379e8e26f66bbf91b579991). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19439 **[Test build #83670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83670/testReport)** for PR 19439 at commit [`c2a4e19`](https://github.com/apache/spark/commit/c2a4e197eec7749eb660b09a1fd6a7a27df32c39). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150166261 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,192 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +.. attribute:: ImageSchema + +A singleton-like attribute of :class:`_ImageSchema` in this module. + +.. autoclass:: _ImageSchema + :members: +""" + +from pyspark import SparkContext +from pyspark.sql.types import Row, _create_row, _parse_datatype_json_string +from pyspark.sql import DataFrame, SparkSession +import numpy as np + + +class _ImageSchema(object): +""" +Internal class for `pyspark.ml.image.ImageSchema` attribute. Meant to be private and +not to be instantized. Use `pyspark.ml.image.ImageSchema` attribute to access the +APIs of this class. +""" + +def __init__(self): +self._imageSchema = None +self._ocvTypes = None +self._imageFields = None +self._undefinedImageType = None + +@property +def imageSchema(self): +""" +Returns the image schema. + +:rtype StructType: a DataFrame with a single column of images + named "image" (nullable) + +.. versionadded:: 2.3.0 +""" + +if self._imageSchema is None: +ctx = SparkContext._active_spark_context +jschema = ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageSchema() +self._imageSchema = _parse_datatype_json_string(jschema.json()) +return self._imageSchema + +@property +def ocvTypes(self): +""" +Returns the OpenCV type mapping supported + +:rtype dict: The OpenCV type mapping supported + +.. versionadded:: 2.3.0 +""" + +if self._ocvTypes is None: +ctx = SparkContext._active_spark_context +self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema._ocvTypes()) +return self._ocvTypes + +@property +def imageFields(self): +""" +Returns field names of image columns. + +:rtype list: a list of field names. + +.. versionadded:: 2.3.0 +""" + +if self._imageFields is None: +ctx = SparkContext._active_spark_context +self._imageFields = list(ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageFields()) +return self._imageFields + +@property +def undefinedImageType(self): +""" +Returns the name of undefined image type for the invalid image. + +.. versionadded:: 2.3.0 +""" + +if self._undefinedImageType is None: +ctx = SparkContext._active_spark_context +self._undefinedImageType = \ + ctx._jvm.org.apache.spark.ml.image.ImageSchema.undefinedImageType() +return self._undefinedImageType + +def toNDArray(self, image): +""" +Converts an image to a one-dimensional array. + +:param image: The image to be converted +:rtype array: The image as a one-dimensional array + +.. versionadded:: 2.3.0 +""" + +height = image.height +width = image.width +nChannels = image.nChannels +return np.ndarray( +shape=(height, width, nChannels), +dtype=np.uint8, +buffer=image.data, +strides=(width * nChannels, nChannels, 1)) + +def toImage(self, array, origin=""): +""" +Converts a one-dimensional array to a two-dimensional image. + +:param array array: The array to convert to image +:param str origin: Path to the image +:rtype object: Two dimensional image
[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19715 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150166092 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,192 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +.. attribute:: ImageSchema + +A singleton-like attribute of :class:`_ImageSchema` in this module. + +.. autoclass:: _ImageSchema + :members: +""" + +from pyspark import SparkContext +from pyspark.sql.types import Row, _create_row, _parse_datatype_json_string +from pyspark.sql import DataFrame, SparkSession +import numpy as np + + +class _ImageSchema(object): +""" +Internal class for `pyspark.ml.image.ImageSchema` attribute. Meant to be private and +not to be instantized. Use `pyspark.ml.image.ImageSchema` attribute to access the +APIs of this class. +""" + +def __init__(self): +self._imageSchema = None +self._ocvTypes = None +self._imageFields = None +self._undefinedImageType = None + +@property +def imageSchema(self): +""" +Returns the image schema. + +:rtype StructType: a DataFrame with a single column of images + named "image" (nullable) + +.. versionadded:: 2.3.0 +""" + +if self._imageSchema is None: +ctx = SparkContext._active_spark_context +jschema = ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageSchema() +self._imageSchema = _parse_datatype_json_string(jschema.json()) +return self._imageSchema + +@property +def ocvTypes(self): +""" +Returns the OpenCV type mapping supported + +:rtype dict: The OpenCV type mapping supported + +.. versionadded:: 2.3.0 +""" + +if self._ocvTypes is None: +ctx = SparkContext._active_spark_context +self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema._ocvTypes()) +return self._ocvTypes + +@property +def imageFields(self): +""" +Returns field names of image columns. + +:rtype list: a list of field names. + +.. versionadded:: 2.3.0 +""" + +if self._imageFields is None: +ctx = SparkContext._active_spark_context +self._imageFields = list(ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageFields()) +return self._imageFields + +@property +def undefinedImageType(self): +""" +Returns the name of undefined image type for the invalid image. + +.. versionadded:: 2.3.0 +""" + +if self._undefinedImageType is None: +ctx = SparkContext._active_spark_context +self._undefinedImageType = \ + ctx._jvm.org.apache.spark.ml.image.ImageSchema.undefinedImageType() +return self._undefinedImageType + +def toNDArray(self, image): +""" +Converts an image to a one-dimensional array. + +:param image: The image to be converted +:rtype array: The image as a one-dimensional array + +.. versionadded:: 2.3.0 +""" + +height = image.height +width = image.width +nChannels = image.nChannels +return np.ndarray( +shape=(height, width, nChannels), +dtype=np.uint8, +buffer=image.data, +strides=(width * nChannels, nChannels, 1)) + +def toImage(self, array, origin=""): +""" +Converts a one-dimensional array to a two-dimensional image. + +:param array array: The array to convert to image +:param str origin: Path to the image --- End diff -- yes, do I need to
[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/19715 @MLnick @viirya Could you please review? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19715: [SPARK-22397][ML]add multiple columns support to ...
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/19715 [SPARK-22397][ML]add multiple columns support to QuantileDiscretizer ## What changes were proposed in this pull request? add multi columns support to QuantileDiscretizer ## How was this patch tested? add UT in QuantileDiscretizerSuite to test multi columns supports You can merge this pull request into a Git repository by running: $ git pull https://github.com/huaxingao/spark spark_22397 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19715.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19715 commit 07bd868956e8d63294b2acb0b5d01a7ca2b35866 Author: Huaxin GaoDate: 2017-11-10T06:57:04Z [SPARK-22397][ML]add multiple columns support to QuantileDiscretizer --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150165810 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,192 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +.. attribute:: ImageSchema + +A singleton-like attribute of :class:`_ImageSchema` in this module. + +.. autoclass:: _ImageSchema + :members: +""" + +from pyspark import SparkContext +from pyspark.sql.types import Row, _create_row, _parse_datatype_json_string +from pyspark.sql import DataFrame, SparkSession +import numpy as np + + +class _ImageSchema(object): +""" +Internal class for `pyspark.ml.image.ImageSchema` attribute. Meant to be private and +not to be instantized. Use `pyspark.ml.image.ImageSchema` attribute to access the +APIs of this class. +""" + +def __init__(self): +self._imageSchema = None +self._ocvTypes = None +self._imageFields = None +self._undefinedImageType = None + +@property +def imageSchema(self): +""" +Returns the image schema. + +:rtype StructType: a DataFrame with a single column of images + named "image" (nullable) + +.. versionadded:: 2.3.0 +""" + +if self._imageSchema is None: +ctx = SparkContext._active_spark_context +jschema = ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageSchema() +self._imageSchema = _parse_datatype_json_string(jschema.json()) +return self._imageSchema + +@property +def ocvTypes(self): +""" +Returns the OpenCV type mapping supported + +:rtype dict: The OpenCV type mapping supported + +.. versionadded:: 2.3.0 +""" + +if self._ocvTypes is None: +ctx = SparkContext._active_spark_context +self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema._ocvTypes()) +return self._ocvTypes + +@property +def imageFields(self): +""" +Returns field names of image columns. + +:rtype list: a list of field names. + +.. versionadded:: 2.3.0 +""" + +if self._imageFields is None: +ctx = SparkContext._active_spark_context +self._imageFields = list(ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageFields()) +return self._imageFields + +@property +def undefinedImageType(self): +""" +Returns the name of undefined image type for the invalid image. + +.. versionadded:: 2.3.0 +""" + +if self._undefinedImageType is None: +ctx = SparkContext._active_spark_context +self._undefinedImageType = \ + ctx._jvm.org.apache.spark.ml.image.ImageSchema.undefinedImageType() +return self._undefinedImageType + +def toNDArray(self, image): +""" +Converts an image to a one-dimensional array. + +:param image: The image to be converted +:rtype array: The image as a one-dimensional array + +.. versionadded:: 2.3.0 +""" + +height = image.height +width = image.width +nChannels = image.nChannels +return np.ndarray( +shape=(height, width, nChannels), +dtype=np.uint8, +buffer=image.data, +strides=(width * nChannels, nChannels, 1)) + +def toImage(self, array, origin=""): +""" +Converts a one-dimensional array to a two-dimensional image. --- End diff -- @holdenk done, good catch, changed wording to "Converts an array with metadata to a two-dimensional image." ---
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150165229 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,192 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +.. attribute:: ImageSchema + +A singleton-like attribute of :class:`_ImageSchema` in this module. + +.. autoclass:: _ImageSchema + :members: +""" + +from pyspark import SparkContext +from pyspark.sql.types import Row, _create_row, _parse_datatype_json_string +from pyspark.sql import DataFrame, SparkSession +import numpy as np + + +class _ImageSchema(object): +""" +Internal class for `pyspark.ml.image.ImageSchema` attribute. Meant to be private and +not to be instantized. Use `pyspark.ml.image.ImageSchema` attribute to access the +APIs of this class. +""" + +def __init__(self): +self._imageSchema = None +self._ocvTypes = None +self._imageFields = None +self._undefinedImageType = None + +@property +def imageSchema(self): +""" +Returns the image schema. + +:rtype StructType: a DataFrame with a single column of images + named "image" (nullable) + +.. versionadded:: 2.3.0 +""" + +if self._imageSchema is None: +ctx = SparkContext._active_spark_context +jschema = ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageSchema() +self._imageSchema = _parse_datatype_json_string(jschema.json()) +return self._imageSchema + +@property +def ocvTypes(self): +""" +Returns the OpenCV type mapping supported + +:rtype dict: The OpenCV type mapping supported + +.. versionadded:: 2.3.0 +""" + +if self._ocvTypes is None: +ctx = SparkContext._active_spark_context +self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema._ocvTypes()) +return self._ocvTypes + +@property +def imageFields(self): +""" +Returns field names of image columns. + +:rtype list: a list of field names. + +.. versionadded:: 2.3.0 +""" + +if self._imageFields is None: +ctx = SparkContext._active_spark_context +self._imageFields = list(ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageFields()) +return self._imageFields + +@property +def undefinedImageType(self): +""" +Returns the name of undefined image type for the invalid image. + +.. versionadded:: 2.3.0 +""" + +if self._undefinedImageType is None: +ctx = SparkContext._active_spark_context +self._undefinedImageType = \ + ctx._jvm.org.apache.spark.ml.image.ImageSchema.undefinedImageType() +return self._undefinedImageType + +def toNDArray(self, image): +""" +Converts an image to a one-dimensional array. + +:param image: The image to be converted +:rtype array: The image as a one-dimensional array + +.. versionadded:: 2.3.0 +""" + +height = image.height +width = image.width +nChannels = image.nChannels +return np.ndarray( +shape=(height, width, nChannels), +dtype=np.uint8, +buffer=image.data, +strides=(width * nChannels, nChannels, 1)) + +def toImage(self, array, origin=""): +""" +Converts a one-dimensional array to a two-dimensional image. + +:param array array: The array to convert to image +:param str origin: Path to the image +:rtype object: Two dimensional image
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150164867 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,192 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +.. attribute:: ImageSchema + +A singleton-like attribute of :class:`_ImageSchema` in this module. --- End diff -- removed the "singleton-like" wording in the doc - please let me know if any other changes are needed here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19651 **[Test build #83669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83669/testReport)** for PR 19651 at commit [`f644c6a`](https://github.com/apache/spark/commit/f644c6a88b4f24376c67028d0e927a2ee49fedbe). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19651 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150164406 --- Diff: python/pyspark/ml/tests.py --- @@ -1818,6 +1819,24 @@ def tearDown(self): del self.data +class ImageReaderTest(SparkSessionTestCase): + +def test_read_images(self): +data_path = 'python/test_support/image/kittens' --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150164008 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,192 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +.. attribute:: ImageSchema + +A singleton-like attribute of :class:`_ImageSchema` in this module. + +.. autoclass:: _ImageSchema + :members: +""" + +from pyspark import SparkContext +from pyspark.sql.types import Row, _create_row, _parse_datatype_json_string +from pyspark.sql import DataFrame, SparkSession +import numpy as np --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150163944 --- Diff: mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.image + +import java.nio.file.Paths +import java.util.Arrays + +import org.apache.spark.SparkFunSuite +import org.apache.spark.ml.image.ImageSchema._ +import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.sql.Row +import org.apache.spark.sql.types._ + +class ImageSchemaSuite extends SparkFunSuite with MLlibTestSparkContext { + // Single column of images named "image" + private lazy val imagePath = "../data/mllib/images" + + test("Smoke test: create basic ImageSchema dataframe") { +val origin = "path" +val width = 1 +val height = 1 +val nChannels = 3 +val data = Array[Byte](0, 0, 0) +val mode = ocvTypes("CV_8UC3") + +// Internal Row corresponds to image StructType +val rows = Seq(Row(Row(origin, height, width, nChannels, mode, data)), + Row(Row(null, height, width, nChannels, mode, data))) +val rdd = sc.makeRDD(rows) +val df = spark.createDataFrame(rdd, ImageSchema.imageSchema) + +assert(df.count === 2, "incorrect image count") +assert(df.schema("image").dataType == columnSchema, "data do not fit ImageSchema") + } + + test("readImages count test") { +var df = readImages(imagePath, recursive = false) +assert(df.count === 1) + +df = readImages(imagePath, recursive = true, dropImageFailures = false) +assert(df.count === 9) + +df = readImages(imagePath, recursive = true, dropImageFailures = true) +val countTotal = df.count +assert(countTotal === 7) + +df = readImages(imagePath, recursive = true, sampleRatio = 0.5, dropImageFailures = true) --- End diff -- agreed +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150163710 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,236 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.image + +import java.awt.Color +import java.awt.color.ColorSpace +import java.io.ByteArrayInputStream +import javax.imageio.ImageIO + +import scala.collection.JavaConverters._ + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.input.PortableDataStream +import org.apache.spark.sql.{DataFrame, Row, SparkSession} +import org.apache.spark.sql.types._ + +@Experimental +@Since("2.3.0") +object ImageSchema { + + val undefinedImageType = "Undefined" + + val imageFields: Array[String] = Array("origin", "height", "width", "nChannels", "mode", "data") + + val ocvTypes: Map[String, Int] = Map( +undefinedImageType -> -1, +"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24 + ) + + /** + * Used for conversion to python + */ + val _ocvTypes: java.util.Map[String, Int] = ocvTypes.asJava + + /** + * Schema for the image column: Row(String, Int, Int, Int, Int, Array[Byte]) + */ + val columnSchema = StructType( +StructField(imageFields(0), StringType, true) :: +StructField(imageFields(1), IntegerType, false) :: +StructField(imageFields(2), IntegerType, false) :: +StructField(imageFields(3), IntegerType, false) :: +// OpenCV-compatible type: CV_8UC3 in most cases +StructField(imageFields(4), IntegerType, false) :: +// Bytes in OpenCV-compatible order: row-wise BGR in most cases +StructField(imageFields(5), BinaryType, false) :: Nil) + + /** + * DataFrame with a single column of images named "image" (nullable) + */ + val imageSchema = StructType(StructField("image", columnSchema, true) :: Nil) + + /** + * :: Experimental :: + * Gets the origin of the image + * + * @return The origin of the image + */ + def getOrigin(row: Row): String = row.getString(0) + + /** + * :: Experimental :: + * Gets the height of the image + * + * @return The height of the image + */ + def getHeight(row: Row): Int = row.getInt(1) + + /** + * :: Experimental :: + * Gets the width of the image + * + * @return The width of the image + */ + def getWidth(row: Row): Int = row.getInt(2) + + /** + * :: Experimental :: + * Gets the number of channels in the image + * + * @return The number of channels in the image + */ + def getNChannels(row: Row): Int = row.getInt(3) + + /** + * :: Experimental :: + * Gets the OpenCV representation as an int + * + * @return The OpenCV representation as an int + */ + def getMode(row: Row): Int = row.getInt(4) + + /** + * :: Experimental :: + * Gets the image data + * + * @return The image data + */ + def getData(row: Row): Array[Byte] = row.getAs[Array[Byte]](5) + + /** + * Default values for the invalid image + * + * @param origin Origin of the invalid image + * @return Row with the default values + */ + private def invalidImageRow(origin: String): Row = +Row(Row(origin, -1, -1, -1, ocvTypes(undefinedImageType), Array.ofDim[Byte](0))) + + /** + * Convert the compressed image (jpeg, png, etc.) into OpenCV + * representation and store it in DataFrame Row + * + * @param origin Arbitrary string that identifies the image + * @param bytes Image bytes (for example, jpeg) + * @return DataFrame Row or None (if the decompression fails) + */ + private[spark] def decode(origin: String, bytes: Array[Byte]): Option[Row] = { +
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150162532 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,236 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.image + +import java.awt.Color +import java.awt.color.ColorSpace +import java.io.ByteArrayInputStream +import javax.imageio.ImageIO + +import scala.collection.JavaConverters._ + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.input.PortableDataStream +import org.apache.spark.sql.{DataFrame, Row, SparkSession} +import org.apache.spark.sql.types._ + +@Experimental +@Since("2.3.0") +object ImageSchema { + + val undefinedImageType = "Undefined" + + val imageFields: Array[String] = Array("origin", "height", "width", "nChannels", "mode", "data") + + val ocvTypes: Map[String, Int] = Map( +undefinedImageType -> -1, +"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24 + ) + + /** + * Used for conversion to python + */ + val _ocvTypes: java.util.Map[String, Int] = ocvTypes.asJava + + /** + * Schema for the image column: Row(String, Int, Int, Int, Int, Array[Byte]) + */ + val columnSchema = StructType( +StructField(imageFields(0), StringType, true) :: +StructField(imageFields(1), IntegerType, false) :: +StructField(imageFields(2), IntegerType, false) :: +StructField(imageFields(3), IntegerType, false) :: +// OpenCV-compatible type: CV_8UC3 in most cases +StructField(imageFields(4), IntegerType, false) :: +// Bytes in OpenCV-compatible order: row-wise BGR in most cases +StructField(imageFields(5), BinaryType, false) :: Nil) + + /** + * DataFrame with a single column of images named "image" (nullable) + */ + val imageSchema = StructType(StructField("image", columnSchema, true) :: Nil) + + /** + * :: Experimental :: + * Gets the origin of the image + * + * @return The origin of the image + */ + def getOrigin(row: Row): String = row.getString(0) + + /** + * :: Experimental :: + * Gets the height of the image + * + * @return The height of the image + */ + def getHeight(row: Row): Int = row.getInt(1) + + /** + * :: Experimental :: + * Gets the width of the image + * + * @return The width of the image + */ + def getWidth(row: Row): Int = row.getInt(2) + + /** + * :: Experimental :: + * Gets the number of channels in the image + * + * @return The number of channels in the image + */ + def getNChannels(row: Row): Int = row.getInt(3) + + /** + * :: Experimental :: + * Gets the OpenCV representation as an int + * + * @return The OpenCV representation as an int + */ + def getMode(row: Row): Int = row.getInt(4) + + /** + * :: Experimental :: + * Gets the image data + * + * @return The image data + */ + def getData(row: Row): Array[Byte] = row.getAs[Array[Byte]](5) + + /** + * Default values for the invalid image + * + * @param origin Origin of the invalid image + * @return Row with the default values + */ + private def invalidImageRow(origin: String): Row = +Row(Row(origin, -1, -1, -1, ocvTypes(undefinedImageType), Array.ofDim[Byte](0))) + + /** + * Convert the compressed image (jpeg, png, etc.) into OpenCV + * representation and store it in DataFrame Row + * + * @param origin Arbitrary string that identifies the image + * @param bytes Image bytes (for example, jpeg) + * @return DataFrame Row or None (if the decompression fails) + */ + private[spark] def decode(origin: String, bytes: Array[Byte]): Option[Row] = { +
[GitHub] spark issue #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EXECUTION...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19712 cc @liancheng @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EXECUTION...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19712 **[Test build #83668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83668/testReport)** for PR 19712 at commit [`e760f52`](https://github.com/apache/spark/commit/e760f52d1c207b63c7ca6ce9de4bd91363e8f28b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150161698 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,236 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.image + +import java.awt.Color +import java.awt.color.ColorSpace +import java.io.ByteArrayInputStream +import javax.imageio.ImageIO + +import scala.collection.JavaConverters._ + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.input.PortableDataStream +import org.apache.spark.sql.{DataFrame, Row, SparkSession} +import org.apache.spark.sql.types._ + +@Experimental +@Since("2.3.0") +object ImageSchema { + + val undefinedImageType = "Undefined" + + val imageFields: Array[String] = Array("origin", "height", "width", "nChannels", "mode", "data") + + val ocvTypes: Map[String, Int] = Map( +undefinedImageType -> -1, +"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24 + ) + + /** + * Used for conversion to python + */ + val _ocvTypes: java.util.Map[String, Int] = ocvTypes.asJava + + /** + * Schema for the image column: Row(String, Int, Int, Int, Int, Array[Byte]) + */ + val columnSchema = StructType( --- End diff -- good idea, done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EX...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/19712#discussion_r150161651 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala --- @@ -521,20 +521,7 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest { conf += resultSet.getString(1) -> resultSet.getString(2) } - assert(conf.get("spark.sql.hive.version") === Some("1.2.1")) -} - } - - test("Checks Hive version via SET") { -withJdbcStatement() { statement => - val resultSet = statement.executeQuery("SET") - - val conf = mutable.Map.empty[String, String] - while (resultSet.next()) { -conf += resultSet.getString(1) -> resultSet.getString(2) - } - - assert(conf.get("spark.sql.hive.version") === Some("1.2.1")) --- End diff -- the first commit fails this ut checking spark.sql.hive.metastore.version. `set` cmd only shows the changed variables, if more unit tests are needed, i can add some. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19713: [SPARK-22488] [SQL] Fix the view resolution issue in the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19713 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19713: [SPARK-22488] [SQL] Fix the view resolution issue in the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19713 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83664/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EXECUTION...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19712 add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19713: [SPARK-22488] [SQL] Fix the view resolution issue in the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19713 **[Test build #83664 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83664/testReport)** for PR 19713 at commit [`d87f333`](https://github.com/apache/spark/commit/d87f33327b351cea493a065d144044cf2c1a069f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EX...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19712#discussion_r150161287 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala --- @@ -521,20 +521,7 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest { conf += resultSet.getString(1) -> resultSet.getString(2) } - assert(conf.get("spark.sql.hive.version") === Some("1.2.1")) -} - } - - test("Checks Hive version via SET") { -withJdbcStatement() { statement => - val resultSet = statement.executeQuery("SET") - - val conf = mutable.Map.empty[String, String] - while (resultSet.next()) { -conf += resultSet.getString(1) -> resultSet.getString(2) - } - - assert(conf.get("spark.sql.hive.version") === Some("1.2.1")) --- End diff -- Just make a try? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150161295 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,236 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.image + +import java.awt.Color +import java.awt.color.ColorSpace +import java.io.ByteArrayInputStream +import javax.imageio.ImageIO + +import scala.collection.JavaConverters._ + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.input.PortableDataStream +import org.apache.spark.sql.{DataFrame, Row, SparkSession} +import org.apache.spark.sql.types._ + +@Experimental +@Since("2.3.0") +object ImageSchema { + + val undefinedImageType = "Undefined" + + val imageFields: Array[String] = Array("origin", "height", "width", "nChannels", "mode", "data") + + val ocvTypes: Map[String, Int] = Map( +undefinedImageType -> -1, +"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24 + ) + + /** + * Used for conversion to python + */ + val _ocvTypes: java.util.Map[String, Int] = ocvTypes.asJava --- End diff -- done, renamed as javaOcvTypes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EX...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/19712#discussion_r150161137 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala --- @@ -521,20 +521,7 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest { conf += resultSet.getString(1) -> resultSet.getString(2) } - assert(conf.get("spark.sql.hive.version") === Some("1.2.1")) -} - } - - test("Checks Hive version via SET") { -withJdbcStatement() { statement => - val resultSet = statement.executeQuery("SET") - - val conf = mutable.Map.empty[String, String] - while (resultSet.next()) { -conf += resultSet.getString(1) -> resultSet.getString(2) - } - - assert(conf.get("spark.sql.hive.version") === Some("1.2.1")) --- End diff -- this might need to set spark.sql.hive.metastore.version explicitly --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EX...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19712#discussion_r150160911 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala --- @@ -521,20 +521,7 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest { conf += resultSet.getString(1) -> resultSet.getString(2) } - assert(conf.get("spark.sql.hive.version") === Some("1.2.1")) -} - } - - test("Checks Hive version via SET") { -withJdbcStatement() { statement => - val resultSet = statement.executeQuery("SET") - - val conf = mutable.Map.empty[String, String] - while (resultSet.next()) { -conf += resultSet.getString(1) -> resultSet.getString(2) - } - - assert(conf.get("spark.sql.hive.version") === Some("1.2.1")) --- End diff -- change it to `spark.sql.hive.metastore.version`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150160540 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/HadoopUtils.scala --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.image + +import scala.language.existentials +import scala.util.Random + +import org.apache.commons.io.FilenameUtils +import org.apache.hadoop.conf.{Configuration, Configured} +import org.apache.hadoop.fs.{Path, PathFilter} +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat + +import org.apache.spark.sql.SparkSession + +private object RecursiveFlag { + /** + * Sets the spark recursive flag and then restores it. + * + * @param value Value to set + * @param spark Existing spark session + * @param f The function to evaluate after setting the flag + * @return Returns the evaluation result T of the function + */ + def withRecursiveFlag[T](value: Boolean, spark: SparkSession)(f: => T): T = { +val flagName = FileInputFormat.INPUT_DIR_RECURSIVE +val hadoopConf = spark.sparkContext.hadoopConfiguration +val old = Option(hadoopConf.get(flagName)) +hadoopConf.set(flagName, value.toString) +try f finally { + old match { +case Some(v) => hadoopConf.set(flagName, v) +case None => hadoopConf.unset(flagName) + } +} + } +} + +/** + * Filter that allows loading a fraction of HDFS files. + */ +private class SamplePathFilter extends Configured with PathFilter { --- End diff -- yes, I'm not sure about whether it will be deterministic even if we set a seed, but I can try to do that for now. As @thunterdb suggested, we could use some sort of a hash on the filename - but I'm not sure on how I would make that implementation work with a specified ratio - could you give me more info on the design: "I would prefer that we do not use a seed and that the result is deterministic, based for example on some hash of the file name, to make it more robust to future code changes. That being said, there is no fundamental issues with the current implementation and other developers may have differing opinions, so the current implementation is fine as far as I am concerned." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EXECUTION...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/19712 cc again @gatorsmile and would you mind adding me to the jenkins' white list? thanks, hoping not bother you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EXECUTION...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19712 **[Test build #83667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83667/testReport)** for PR 19712 at commit [`e760f52`](https://github.com/apache/spark/commit/e760f52d1c207b63c7ca6ce9de4bd91363e8f28b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19714: [SPARK-22489][SQL] Shouldn't change broadcast join build...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19714 **[Test build #83666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83666/testReport)** for PR 19714 at commit [`68dfc42`](https://github.com/apache/spark/commit/68dfc42d80548c1eeb75275df43d4542146a60d4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19714: [SPARK-22489][SQL] Shouldn't change broadcast joi...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/19714 [SPARK-22489][SQL] Shouldn't change broadcast join buildSide if user clearly specified ## What changes were proposed in this pull request? How to reproduce: ```scala import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec spark.createDataFrame(Seq((1, "4"), (2, "2"))).toDF("key", "value").createTempView("table1") spark.createDataFrame(Seq((1, "1"), (2, "2"))).toDF("key", "value").createTempView("table2") val bl = sql(s"SELECT /*+ MAPJOIN(t1) */ * FROM table1 t1 JOIN table2 t2 ON t1.key = t2.key").queryExecution.executedPlan println(bl.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide) ``` The result is `BuildRight`, but should be `BuildLeft`. This PR fix this issue. ## How was this patch tested? unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22489 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19714.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19714 commit 68dfc42d80548c1eeb75275df43d4542146a60d4 Author: Yuming WangDate: 2017-11-10T05:55:51Z Shouldn't change broadcast join buildSide if user clearly specified --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19707: [SPARK-22472][SQL] add null check for top-level primitiv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19707 Thanks! Merged to master/2.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19707: [SPARK-22472][SQL] add null check for top-level p...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19707 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19707: [SPARK-22472][SQL] add null check for top-level primitiv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19707 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150157767 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,236 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.image + +import java.awt.Color +import java.awt.color.ColorSpace +import java.io.ByteArrayInputStream +import javax.imageio.ImageIO + +import scala.collection.JavaConverters._ + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.input.PortableDataStream +import org.apache.spark.sql.{DataFrame, Row, SparkSession} +import org.apache.spark.sql.types._ + +@Experimental +@Since("2.3.0") +object ImageSchema { + + val undefinedImageType = "Undefined" + + val imageFields: Array[String] = Array("origin", "height", "width", "nChannels", "mode", "data") + + val ocvTypes: Map[String, Int] = Map( +undefinedImageType -> -1, +"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24 + ) + + /** + * Used for conversion to python + */ + val _ocvTypes: java.util.Map[String, Int] = ocvTypes.asJava + + /** + * Schema for the image column: Row(String, Int, Int, Int, Int, Array[Byte]) + */ + val columnSchema = StructType( +StructField(imageFields(0), StringType, true) :: +StructField(imageFields(1), IntegerType, false) :: +StructField(imageFields(2), IntegerType, false) :: +StructField(imageFields(3), IntegerType, false) :: +// OpenCV-compatible type: CV_8UC3 in most cases +StructField(imageFields(4), IntegerType, false) :: +// Bytes in OpenCV-compatible order: row-wise BGR in most cases +StructField(imageFields(5), BinaryType, false) :: Nil) + + /** + * DataFrame with a single column of images named "image" (nullable) + */ + val imageSchema = StructType(StructField("image", columnSchema, true) :: Nil) + + /** + * :: Experimental :: + * Gets the origin of the image + * + * @return The origin of the image + */ + def getOrigin(row: Row): String = row.getString(0) + + /** + * :: Experimental :: + * Gets the height of the image + * + * @return The height of the image + */ + def getHeight(row: Row): Int = row.getInt(1) + + /** + * :: Experimental :: + * Gets the width of the image + * + * @return The width of the image + */ + def getWidth(row: Row): Int = row.getInt(2) + + /** + * :: Experimental :: + * Gets the number of channels in the image + * + * @return The number of channels in the image + */ + def getNChannels(row: Row): Int = row.getInt(3) + + /** + * :: Experimental :: + * Gets the OpenCV representation as an int + * + * @return The OpenCV representation as an int + */ + def getMode(row: Row): Int = row.getInt(4) + + /** + * :: Experimental :: + * Gets the image data + * + * @return The image data + */ + def getData(row: Row): Array[Byte] = row.getAs[Array[Byte]](5) + + /** + * Default values for the invalid image + * + * @param origin Origin of the invalid image + * @return Row with the default values + */ + private def invalidImageRow(origin: String): Row = +Row(Row(origin, -1, -1, -1, ocvTypes(undefinedImageType), Array.ofDim[Byte](0))) + + /** + * Convert the compressed image (jpeg, png, etc.) into OpenCV + * representation and store it in DataFrame Row + * + * @param origin Arbitrary string that identifies the image + * @param bytes Image bytes (for example, jpeg) + * @return DataFrame Row or None (if the decompression fails) + */ + private[spark] def decode(origin: String, bytes: Array[Byte]): Option[Row] = { +
[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r150157663 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,236 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.image + +import java.awt.Color +import java.awt.color.ColorSpace +import java.io.ByteArrayInputStream +import javax.imageio.ImageIO + +import scala.collection.JavaConverters._ + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.input.PortableDataStream +import org.apache.spark.sql.{DataFrame, Row, SparkSession} +import org.apache.spark.sql.types._ + +@Experimental +@Since("2.3.0") +object ImageSchema { + + val undefinedImageType = "Undefined" + + val imageFields: Array[String] = Array("origin", "height", "width", "nChannels", "mode", "data") + + val ocvTypes: Map[String, Int] = Map( +undefinedImageType -> -1, +"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24 + ) + + /** + * Used for conversion to python + */ + val _ocvTypes: java.util.Map[String, Int] = ocvTypes.asJava + + /** + * Schema for the image column: Row(String, Int, Int, Int, Int, Array[Byte]) + */ + val columnSchema = StructType( +StructField(imageFields(0), StringType, true) :: +StructField(imageFields(1), IntegerType, false) :: +StructField(imageFields(2), IntegerType, false) :: +StructField(imageFields(3), IntegerType, false) :: +// OpenCV-compatible type: CV_8UC3 in most cases +StructField(imageFields(4), IntegerType, false) :: +// Bytes in OpenCV-compatible order: row-wise BGR in most cases +StructField(imageFields(5), BinaryType, false) :: Nil) + + /** + * DataFrame with a single column of images named "image" (nullable) + */ + val imageSchema = StructType(StructField("image", columnSchema, true) :: Nil) + + /** + * :: Experimental :: + * Gets the origin of the image + * + * @return The origin of the image + */ + def getOrigin(row: Row): String = row.getString(0) + + /** + * :: Experimental :: + * Gets the height of the image + * + * @return The height of the image + */ + def getHeight(row: Row): Int = row.getInt(1) + + /** + * :: Experimental :: --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19661: [SPARK-22450][Core][Mllib]safely register class f...
Github user ConeyLiu commented on a diff in the pull request: https://github.com/apache/spark/pull/19661#discussion_r150157116 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -178,6 +179,28 @@ class KryoSerializer(conf: SparkConf) kryo.register(Utils.classForName("scala.collection.immutable.Map$EmptyMap$")) kryo.register(classOf[ArrayBuffer[Any]]) +// We can't load those class directly in order to avoid unnecessary jar dependencies. +// We load them safely, ignore it if the class not found. +Seq("org.apache.spark.mllib.linalg.Vector", + "org.apache.spark.mllib.linalg.DenseVector", + "org.apache.spark.mllib.linalg.SparseVector", + "org.apache.spark.mllib.linalg.Matrix", + "org.apache.spark.mllib.linalg.DenseMatrix", + "org.apache.spark.mllib.linalg.SparseMatrix", + "org.apache.spark.ml.linalg.Vector", + "org.apache.spark.ml.linalg.DenseVector", + "org.apache.spark.ml.linalg.SparseVector", + "org.apache.spark.ml.linalg.Matrix", + "org.apache.spark.ml.linalg.DenseMatrix", + "org.apache.spark.ml.linalg.SparseMatrix", + "org.apache.spark.ml.feature.Instance", + "org.apache.spark.ml.feature.OffsetInstance" +).map(name => Try(Utils.classForName(name))).foreach { t => --- End diff -- updated. thanks for the advice. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19661: [SPARK-22450][Core][Mllib]safely register class for mlli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19661 **[Test build #83665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83665/testReport)** for PR 19661 at commit [`d7090bb`](https://github.com/apache/spark/commit/d7090bbf60ea98e9ade9534b78e249b0f25621e4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick Please find some time to review it and let me know if we can proceed with this. Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19702 Will review it tomorrow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19272 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83661/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19272 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19272 **[Test build #83661 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83661/testReport)** for PR 19272 at commit [`45b46ed`](https://github.com/apache/spark/commit/45b46ed6768ea50ddf23063b2a925c2a4794acc7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83662/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #83662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83662/testReport)** for PR 13599 at commit [`8474fbc`](https://github.com/apache/spark/commit/8474fbc001a8c418b210d014b55f5ee71c683d06). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19272 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83660/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19272 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19272 **[Test build #83660 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83660/testReport)** for PR 19272 at commit [`8df7e37`](https://github.com/apache/spark/commit/8df7e37517a21d5fbaa2c0e7abfa248fd3ff9be3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19713: [SPARK-22488] [SQL] Fix the view resolution issue in the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19713 **[Test build #83664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83664/testReport)** for PR 19713 at commit [`d87f333`](https://github.com/apache/spark/commit/d87f33327b351cea493a065d144044cf2c1a069f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19713: [SPARK-22488] [SQL] Fix the view resolution issue in the...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19713 cc @cloud-fan @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19713: [SPARK-22488] [SQL] Fix the view resolution issue...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/19713 [SPARK-22488] [SQL] Fix the view resolution issue in the SparkSession internal table() API ## What changes were proposed in this pull request? The current internal `table()` API of `SparkSession` bypasses the Analyzer and directly calls `sessionState.catalog.lookupRelation` API. This skips the view resolution logics in our Analyzer rule `ResolveRelations`. This internal API is widely used by various DDL commands or the other internal APIs. Users might get the strange error caused by view resolution when the default database is different. ``` Table or view not found: t1; line 1 pos 14 org.apache.spark.sql.AnalysisException: Table or view not found: t1; line 1 pos 14 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) ``` This PR is to fix it by enforcing it to use `ResolveRelations` to resolve the table. ## How was this patch tested? Added a test case and modified the existing test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark viewResolution Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19713.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19713 commit d87f33327b351cea493a065d144044cf2c1a069f Author: gatorsmileDate: 2017-11-10T03:47:59Z fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EXECUTION...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19712 **[Test build #83663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83663/testReport)** for PR 19712 at commit [`6071926`](https://github.com/apache/spark/commit/607192603b88f6ed4543587489188f20b9b236e0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EXECUTION...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19712 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83663/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EXECUTION...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19712 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19708: [SPARK-22479][SQL] Exclude credentials from Savei...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19708#discussion_r150150720 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SaveIntoDataSourceCommand.scala --- @@ -46,4 +46,6 @@ case class SaveIntoDataSourceCommand( Seq.empty[Row] } + + override def simpleString: String = s"SaveIntoDataSourceCommand ${dataSource}, ${mode}" --- End diff -- https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L2631-L2638 Reuse `spark.redaction.regex`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EXECUTION...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19712 **[Test build #83663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83663/testReport)** for PR 19712 at commit [`6071926`](https://github.com/apache/spark/commit/607192603b88f6ed4543587489188f20b9b236e0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EXECUTION...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19712 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19705: [SPARK-22308][test-maven] Support alternative uni...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19705 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EXECUTION...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19712 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19712: [SPARK-22487][SQL][Hive]Remove the unused HIVE_EX...
GitHub user yaooqinn opened a pull request: https://github.com/apache/spark/pull/19712 [SPARK-22487][SQL][Hive]Remove the unused HIVE_EXECUTION_VERSION property ## What changes were proposed in this pull request? Actually there is no hive client for executions in spark now and there are no usages of HIVE_EXECUTION_VERSION found in whole spark project. HIVE_EXECUTION_VERSION is set by `spark.sql.hive.version`, which is still set internally in some places or by users, this may confuse developers and users with HIVE_METASTORE_VERSION(spark.sql.hive.metastore.version). It might better to be removed. ## How was this patch tested? modify some existing ut cc @cloud-fan @gatorsmile You can merge this pull request into a Git repository by running: $ git pull https://github.com/yaooqinn/spark SPARK-22487 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19712.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19712 commit 607192603b88f6ed4543587489188f20b9b236e0 Author: Kent YaoDate: 2017-11-10T03:06:32Z rm unused hive_execution_version --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19705 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19705 To check the syntax, you can run the following command > dev/lint-scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19681 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83658/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19681 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15770 LGTM. ping @yanboliang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19681 **[Test build #83658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83658/testReport)** for PR 19681 at commit [`1a31665`](https://github.com/apache/spark/commit/1a31665ab6d3352dee3e15c87a697a7e655eb34c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19459 Looks pretty solid. Will take a another look today (KST) and merge this one in few days if there are no more comments and/or other committers are busy to take a look and merge. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/19459 @ueshin @HyukjinKwon does this look ready to merge? cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19698 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83659/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19698 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19698: [SPARK-20648][core] Port JobsTab and StageTab to the new...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19698 **[Test build #83659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83659/testReport)** for PR 19698 at commit [`1d7242b`](https://github.com/apache/spark/commit/1d7242b340b9525feab941c7d61a6dccb8ccc14c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19705 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83657/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19705 **[Test build #83657 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83657/testReport)** for PR 19705 at commit [`12a1d37`](https://github.com/apache/spark/commit/12a1d37ec721a556592cae3c5aff129b6a0663d0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19479#discussion_r150134850 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -1034,11 +1034,18 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat schema.fields.map(f => (f.name, f.dataType)).toMap stats.colStats.foreach { case (colName, colStat) => colStat.toMap(colName, colNameTypeMap(colName)).foreach { case (k, v) => -statsProperties += (columnStatKeyPropName(colName, k) -> v) +val statKey = columnStatKeyPropName(colName, k) +val threshold = conf.get(SCHEMA_STRING_LENGTH_THRESHOLD) +if (v.length > threshold) { + throw new AnalysisException(s"Cannot persist '$statKey' into hive metastore as " + --- End diff -- Hive's exception is not friendly to Spark users. Spark user may not know what's wrong in his operation: ``` org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. Put request failed : INSERT INTO TABLE_PARAMS (PARAM_VALUE,TBL_ID,PARAM_KEY) VALUES (?,?,?) org.datanucleus.exceptions.NucleusDataStoreException: Put request failed : INSERT INTO TABLE_PARAMS (PARAM_VALUE,TBL_ID,PARAM_KEY) VALUES (?,?,?) ... Caused by: java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'TFo0QmxvY2smeREAANBdAAALz3IBM0AUAAEAQgPoP/ALAAQUACNAJBAAEy4I&' to length 4000. ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19689 and also I believe anyone can leave the sign-off too if it looks good :). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #83662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83662/testReport)** for PR 13599 at commit [`8474fbc`](https://github.com/apache/spark/commit/8474fbc001a8c418b210d014b55f5ee71c683d06). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19272 **[Test build #83661 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83661/testReport)** for PR 19272 at commit [`45b46ed`](https://github.com/apache/spark/commit/45b46ed6768ea50ddf23063b2a925c2a4794acc7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @weichenXu123 Any other comments? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19689 cc @cloud-fan for review too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19689 @juliuszsompolski No problem. Non-committer can still review. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19702: [SPARK-10365][SQL] Support Parquet logical type T...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19702#discussion_r150131141 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1143,6 +1159,18 @@ class SQLConf extends Serializable with Logging { def isParquetINT64AsTimestampMillis: Boolean = getConf(PARQUET_INT64_AS_TIMESTAMP_MILLIS) + def parquetOutputTimestampType: ParquetOutputTimestampType.Value = { +val isOutputTimestampTypeSet = settings.containsKey(PARQUET_OUTPUT_TIMESTAMP_TYPE.key) +if (!isOutputTimestampTypeSet && isParquetINT64AsTimestampMillis) { + // If PARQUET_OUTPUT_TIMESTAMP_TYPE is not set and PARQUET_INT64_AS_TIMESTAMP_MILLIS is set, + // respect PARQUET_INT64_AS_TIMESTAMP_MILLIS and use TIMESTAMP_MILLIS. Otherwise, + // PARQUET_OUTPUT_TIMESTAMP_TYPE has higher priority. --- End diff -- BTW, do we have a simple test for this priority? seems `isParquetINT64AsTimestampMillis` defaults to `false`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19272 **[Test build #83660 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83660/testReport)** for PR 19272 at commit [`8df7e37`](https://github.com/apache/spark/commit/8df7e37517a21d5fbaa2c0e7abfa248fd3ff9be3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org