[GitHub] spark pull request #20256: [SPARK-23063][K8S] K8s changes for publishing scr...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20256#discussion_r161366655 --- Diff: dev/create-release/releaseutils.py --- @@ -185,6 +185,7 @@ def get_commits(tag): "graphx": "GraphX", "input/output": CORE_COMPONENT, "java api": "Java API", +"kubernetes": "Kubernetes", --- End diff -- yes, I think this looks right --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20256: [SPARK-23063][K8S] K8s changes for publishing scripts (a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20256 **[Test build #86086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86086/testReport)** for PR 20256 at commit [`b0a2ead`](https://github.com/apache/spark/commit/b0a2ead5935408370a5303fc8f7315357314aeca). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20256: [SPARK-23063][K8S] K8s changes for publishing scr...
Github user foxish commented on a diff in the pull request: https://github.com/apache/spark/pull/20256#discussion_r161366632 --- Diff: dev/create-release/releaseutils.py --- @@ -185,6 +185,7 @@ def get_commits(tag): "graphx": "GraphX", "input/output": CORE_COMPONENT, "java api": "Java API", +"kubernetes": "Kubernetes", --- End diff -- Ah, okay, I misread that previously - updated the mapping, and it looks like the script turns things to lower case anyway, so `k8s` and `kubernetes` ought to cover everything. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20211: [SPARK-23011][PYTHON][SQL] Prepend missing groupi...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20211#discussion_r161366590 --- Diff: python/pyspark/sql/group.py --- @@ -233,6 +233,27 @@ def apply(self, udf): | 2| 1.1094003924504583| +---+---+ +Notes on grouping column: --- End diff -- sounds to me like we could either stick with func(key, pdf) or whatever pandas does. (yes, for gapply, the returned data frame is expected to have key columns prepended; there was one SPARK-16258 proposing to eliminate that extra work) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19001 **[Test build #86085 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86085/testReport)** for PR 19001 at commit [`d37eb8b`](https://github.com/apache/spark/commit/d37eb8b3359981756c923948fe12833a56b61865). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19001 Jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r161366253 --- Diff: python/pyspark/ml/image.py --- @@ -71,9 +88,33 @@ def ocvTypes(self): """ if self._ocvTypes is None: -ctx = SparkContext._active_spark_context -self._ocvTypes = dict(ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()) -return self._ocvTypes +ctx = SparkContext.getOrCreate() +ocvTypeList = ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes() +self._ocvTypes = [self._OcvType(name=x.name(), +mode=x.mode(), +nChannels=x.nChannels(), +dataType=x.dataType(), + nptype=self._ocvToNumpyMap[x.dataType()]) + for x in ocvTypeList] +return self._ocvTypes[:] + +def ocvTypeByName(self, name): --- End diff -- Let's write a doc and doctest too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r161366127 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -37,20 +37,54 @@ import org.apache.spark.sql.types._ @Since("2.3.0") object ImageSchema { - val undefinedImageType = "Undefined" + /** + * OpenCv type representation + * @param mode ordinal for the type + * @param dataType open cv data type + * @param nChannels number of color channels + */ + case class OpenCvType(mode: Int, dataType: String, nChannels: Int) { +def name: String = if (mode == -1) { "Undefined" } else { s"CV_$dataType" + s"C$nChannels" } +override def toString: String = s"OpenCvType(mode = $mode, name = $name)" + } + + def ocvTypeByName(name: String): OpenCvType = { +ocvTypes.find(x => x.name == name).getOrElse( + throw new IllegalArgumentException("Unknown open cv type " + name)) + } + + def ocvTypeByMode(mode: Int): OpenCvType = { +ocvTypes.find(x => x.mode == mode).getOrElse( + throw new IllegalArgumentException("Unknown open cv mode " + mode)) + } + + val undefinedImageType = OpenCvType(-1, "N/A", -1) /** - * (Scala-specific) OpenCV type mapping supported + * A Mapping of Type to Numbers in OpenCV + * + *C1 C2 C3 C4 + * CV_8U 0 8 16 24 + * CV_8S 1 9 17 25 + * CV_16U 2 10 18 26 + * CV_16S 3 11 19 27 + * CV_32S 4 12 20 28 + * CV_32F 5 13 21 29 + * CV_64F 6 14 22 30 */ - val ocvTypes: Map[String, Int] = Map( -undefinedImageType -> -1, -"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24 - ) + val ocvTypes = { +val types = + for (nc <- Array(1, 2, 3, 4); + dt <- Array("8U", "8S", "16U", "16S", "32S", "32F", "64F")) +yield (dt, nc) +val ordinals = for (i <- 0 to 3; j <- 0 to 6) yield ( i * 8 + j) +undefinedImageType +: (ordinals zip types).map(x => OpenCvType(x._1, x._2._1, x._2._2)) + } /** - * (Java-specific) OpenCV type mapping supported + * (Java Specific) list of OpenCv types */ - val javaOcvTypes: java.util.Map[String, Int] = ocvTypes.asJava --- End diff -- Let's set the explicit type here .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r161366196 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -37,20 +37,54 @@ import org.apache.spark.sql.types._ @Since("2.3.0") object ImageSchema { - val undefinedImageType = "Undefined" + /** + * OpenCv type representation + * @param mode ordinal for the type + * @param dataType open cv data type + * @param nChannels number of color channels + */ + case class OpenCvType(mode: Int, dataType: String, nChannels: Int) { +def name: String = if (mode == -1) { "Undefined" } else { s"CV_$dataType" + s"C$nChannels" } +override def toString: String = s"OpenCvType(mode = $mode, name = $name)" + } + + def ocvTypeByName(name: String): OpenCvType = { +ocvTypes.find(x => x.name == name).getOrElse( + throw new IllegalArgumentException("Unknown open cv type " + name)) + } + + def ocvTypeByMode(mode: Int): OpenCvType = { +ocvTypes.find(x => x.mode == mode).getOrElse( + throw new IllegalArgumentException("Unknown open cv mode " + mode)) + } + + val undefinedImageType = OpenCvType(-1, "N/A", -1) /** - * (Scala-specific) OpenCV type mapping supported + * A Mapping of Type to Numbers in OpenCV + * + *C1 C2 C3 C4 + * CV_8U 0 8 16 24 + * CV_8S 1 9 17 25 + * CV_16U 2 10 18 26 + * CV_16S 3 11 19 27 + * CV_32S 4 12 20 28 + * CV_32F 5 13 21 29 + * CV_64F 6 14 22 30 */ - val ocvTypes: Map[String, Int] = Map( -undefinedImageType -> -1, -"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24 - ) + val ocvTypes = { +val types = + for (nc <- Array(1, 2, 3, 4); + dt <- Array("8U", "8S", "16U", "16S", "32S", "32F", "64F")) +yield (dt, nc) +val ordinals = for (i <- 0 to 3; j <- 0 to 6) yield ( i * 8 + j) +undefinedImageType +: (ordinals zip types).map(x => OpenCvType(x._1, x._2._1, x._2._2)) + } /** - * (Java-specific) OpenCV type mapping supported + * (Java Specific) list of OpenCv types --- End diff -- Let's keep as is `(Java-specific)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r161366123 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -37,20 +37,54 @@ import org.apache.spark.sql.types._ @Since("2.3.0") object ImageSchema { - val undefinedImageType = "Undefined" + /** + * OpenCv type representation + * @param mode ordinal for the type + * @param dataType open cv data type + * @param nChannels number of color channels + */ + case class OpenCvType(mode: Int, dataType: String, nChannels: Int) { +def name: String = if (mode == -1) { "Undefined" } else { s"CV_$dataType" + s"C$nChannels" } +override def toString: String = s"OpenCvType(mode = $mode, name = $name)" + } + + def ocvTypeByName(name: String): OpenCvType = { +ocvTypes.find(x => x.name == name).getOrElse( + throw new IllegalArgumentException("Unknown open cv type " + name)) + } + + def ocvTypeByMode(mode: Int): OpenCvType = { +ocvTypes.find(x => x.mode == mode).getOrElse( + throw new IllegalArgumentException("Unknown open cv mode " + mode)) + } + + val undefinedImageType = OpenCvType(-1, "N/A", -1) /** - * (Scala-specific) OpenCV type mapping supported + * A Mapping of Type to Numbers in OpenCV + * + *C1 C2 C3 C4 + * CV_8U 0 8 16 24 + * CV_8S 1 9 17 25 + * CV_16U 2 10 18 26 + * CV_16S 3 11 19 27 + * CV_32S 4 12 20 28 + * CV_32F 5 13 21 29 + * CV_64F 6 14 22 30 */ - val ocvTypes: Map[String, Int] = Map( -undefinedImageType -> -1, -"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24 - ) + val ocvTypes = { --- End diff -- Could we set the explicit type? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r161366218 --- Diff: python/pyspark/ml/image.py --- @@ -71,9 +88,33 @@ def ocvTypes(self): """ --- End diff -- Seems we should fix the doc for `:return:`. Seems it's going to be a list now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20256: [SPARK-23063][K8S] K8s changes for publishing scr...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20256#discussion_r161366220 --- Diff: dev/create-release/releaseutils.py --- @@ -185,6 +185,7 @@ def get_commits(tag): "graphx": "GraphX", "input/output": CORE_COMPONENT, "java api": "Java API", +"kubernetes": "Kubernetes", --- End diff -- so looks like this is for both commit title and JIRA component field... which isn't quite perfect (for example, not R here) but in any case, multiple left value can map to the same right value, it looks like --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20256: [SPARK-23063][K8S] K8s changes for publishing scripts (a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20256 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20256: [SPARK-23063][K8S] K8s changes for publishing scripts (a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20256 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86077/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20256: [SPARK-23063][K8S] K8s changes for publishing scripts (a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20256 **[Test build #86077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86077/testReport)** for PR 20256 at commit [`df6f49d`](https://github.com/apache/spark/commit/df6f49d8d04da9ff8113929802a3c674c572e9f5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20237: [SPARK-22980][PYTHON][SQL] Clarify the length of ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20237 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20237: [SPARK-22980][PYTHON][SQL] Clarify the length of each se...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20237 Merged to master and branch-2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20153 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86080/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20153 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20153 **[Test build #86080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86080/testReport)** for PR 20153 at commit [`d666110`](https://github.com/apache/spark/commit/d6661104f314c88ff84057fd4830e7a5fbe964d9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20247: [SPARK-23021][SQL] AnalysisBarrier should override inner...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20247 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20247: [SPARK-23021][SQL] AnalysisBarrier should override inner...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20247 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86078/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20247: [SPARK-23021][SQL] AnalysisBarrier should override inner...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20247 **[Test build #86078 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86078/testReport)** for PR 20247 at commit [`7692099`](https://github.com/apache/spark/commit/7692099c42907682a5ca10fa6a800fcb1a6e745d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorruptFiles` ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20240 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorruptFiles` ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20240 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86079/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorruptFiles` ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20240 **[Test build #86079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86079/testReport)** for PR 20240 at commit [`33ae3ca`](https://github.com/apache/spark/commit/33ae3ca34aa237c630927c96d9421ea53ed6a775). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20151: [SPARK-22959][PYTHON] Configuration to select the module...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20151 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86075/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20151: [SPARK-22959][PYTHON] Configuration to select the module...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20151 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20151: [SPARK-22959][PYTHON] Configuration to select the module...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20151 **[Test build #86075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86075/testReport)** for PR 20151 at commit [`fc65803`](https://github.com/apache/spark/commit/fc658034639c1aa56ff5b9a44624cad05377fe51). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20256: [SPARK-23063][K8S] K8s changes for publishing scr...
Github user foxish commented on a diff in the pull request: https://github.com/apache/spark/pull/20256#discussion_r161365711 --- Diff: dev/create-release/releaseutils.py --- @@ -185,6 +185,7 @@ def get_commits(tag): "graphx": "GraphX", "input/output": CORE_COMPONENT, "java api": "Java API", +"kubernetes": "Kubernetes", --- End diff -- Can we supply a list/tuple there? I've updated it to `K8S`, but sometimes folks have written `k8s` or `kubernetes` in the PR titles by the looks of it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20256: [SPARK-23063][K8S] K8s changes for publishing scripts (a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20256 **[Test build #86084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86084/testReport)** for PR 20256 at commit [`73fb21e`](https://github.com/apache/spark/commit/73fb21e4e6fc12bd9d77b98ade8b2ed011b8d68f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20232: [SPARK-23042][ML] Use OneHotEncoderModel to encod...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20232#discussion_r161365548 --- Diff: R/pkg/tests/fulltests/test_mllib_classification.R --- @@ -382,10 +382,10 @@ test_that("spark.mlp", { trainidxs <- base::sample(nrow(data), nrow(data) * 0.7) traindf <- as.DataFrame(data[trainidxs, ]) testdf <- as.DataFrame(rbind(data[-trainidxs, ], c(0, "the other"))) - model <- spark.mlp(traindf, clicked ~ ., layers = c(1, 3)) + model <- spark.mlp(traindf, clicked ~ ., layers = c(1, 2)) --- End diff -- ok; I think perhaps we need to release-note this (like [this](http://spark.apache.org/docs/latest/sparkr.html#upgrading-to-sparkr-220)) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20163: [SPARK-22966][PYTHON][SQL] Python UDFs with retur...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20163#discussion_r161365496 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala --- @@ -144,6 +145,7 @@ object EvaluatePython { } case StringType => (obj: Any) => nullSafeConvert(obj) { + case _: Calendar => null case _ => UTF8String.fromString(obj.toString) --- End diff -- @cloud-fan, how about something like this then? ```scala case StringType => (obj: Any) => nullSafeConvert(obj) { // Shortcut for string conversion case c: String => UTF8String.fromString(c) // Here, we return null for 'array', 'tuple', 'dict', 'list', 'datetime.datetime', // 'datetime.date' and 'datetime.time' because those string conversions are // not quite consistent with SQL string representation of data. case _: java.util.Calendar | _: net.razorvine.pickle.objects.Time | _: java.util.List[_] | _: java.util.Map[_, _] => null case c if c.getClass.isArray => null // Here, we keep the string conversion fall back for compatibility. // TODO: We should revisit this and rewrite the type conversion logic in Spark 3.x. case other => UTF8String.fromString(other.toString) } ``` My few tests: `datetime.time`: ``` from pyspark.sql.functions import udf from datetime import time f = udf(lambda x: time(0, 0), "string") spark.range(1).select(f("id")).show() ``` ``` ++ |(id)| ++ |Time: 0 hours, 0 ...| ++ ``` `array`: ``` from pyspark.sql.functions import udf import array f = udf(lambda x: array.array("c", "aaa"), "string") spark.range(1).select(f("id")).show() ``` ``` ++ |(id)| ++ | [C@11618d9e| ++ ``` `tuple`: ``` from pyspark.sql.functions import udf f = udf(lambda x: (x,), "string") spark.range(1).select(f("id")).show() ``` ``` ++ |(id)| ++ |[Ljava.lang.Objec...| ++ ``` `list`: ``` from pyspark.sql.functions import udf from datetime import datetime f = udf(lambda x: [datetime(1990, 1, 1)], "string") spark.range(1).select(f("id")).show() ``` ``` ++ |(id)| ++ |[java.util.Gregor...| ++ ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20254#discussion_r161365422 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1364,7 +1364,9 @@ def subtract(self, other): """ Return a new :class:`DataFrame` containing rows in this frame but not in another frame. -This is equivalent to `EXCEPT` in SQL. +This is equivalent to `EXCEPT DISTINCT` in SQL. + +(Note: Before Spark 2.0, the behavior was equivalent to `EXCEPT ALL` in SQL.) --- End diff -- nit: `2.0` to `2.0.0` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20254#discussion_r161365371 --- Diff: R/pkg/R/DataFrame.R --- @@ -2873,6 +2873,7 @@ setMethod("intersect", #' @rdname except #' @export #' @note except since 1.4.0 +#' @note behaviour changed from \code{EXCEPT ALL} to \code{EXCEPT DISTINCT} in 2.0. --- End diff -- I don't mind it either way, but to note: - r doc order and whitespace is significant, if you use `#' Note:` you must put it after L2856, if you put an extra `#'` ie. empty line that it becomes the `Details` section, which might be the right place; see http://spark.apache.org/docs/latest/api/R/awaitTermination.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20254#discussion_r161365416 --- Diff: R/pkg/R/DataFrame.R --- @@ -2873,6 +2873,7 @@ setMethod("intersect", #' @rdname except #' @export #' @note except since 1.4.0 +#' @note behaviour changed from \code{EXCEPT ALL} to \code{EXCEPT DISTINCT} in 2.0. --- End diff -- ie. ``` #' but not in another SparkDataFrame. This is equivalent to \code{EXCEPT DISTINCT} in SQL. #' #' Note: Before Spark 2.0.0, the behavior was equivalent to `EXCEPT ALL` in SQL. #' #' @param x a SparkDataFrame. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20256: [SPARK-23063][K8S] K8s changes for publishing scr...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20256#discussion_r161365293 --- Diff: dev/create-release/releaseutils.py --- @@ -185,6 +185,7 @@ def get_commits(tag): "graphx": "GraphX", "input/output": CORE_COMPONENT, "java api": "Java API", +"kubernetes": "Kubernetes", --- End diff -- this is for the PR title [foo] - I think [k8s] is more widely used, maybe both --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19001 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86074/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19001 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19001 **[Test build #86074 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86074/testReport)** for PR 19001 at commit [`3c367a0`](https://github.com/apache/spark/commit/3c367a08fa5290081e82d45ea7bf564277f196b0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` throw new IOException(\"Cannot find class \" + inputFormatClassName, e);` * ` throw new IOException(\"Unable to find the InputFormat class \" + inputFormatClassName, e);` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20080: [SPARK-22870][CORE] Dynamic allocation should allow 0 id...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20080 **[Test build #86083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86083/testReport)** for PR 20080 at commit [`b03a496`](https://github.com/apache/spark/commit/b03a4968976781dff03961abc5caedae10ef10aa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20253 Kicked off the 5 runs. I noticed one of the earlier runs actually failed in PySpark. I don't know of a plausible mechanism by which this PR could cause that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20253 **[Test build #4043 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4043/testReport)** for PR 20253 at commit [`4bb9c3f`](https://github.com/apache/spark/commit/4bb9c3f06f4da1c14ab24ad6a642bf831c90503f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20153 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20253 **[Test build #4044 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4044/testReport)** for PR 20253 at commit [`4bb9c3f`](https://github.com/apache/spark/commit/4bb9c3f06f4da1c14ab24ad6a642bf831c90503f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20253 **[Test build #4045 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4045/testReport)** for PR 20253 at commit [`4bb9c3f`](https://github.com/apache/spark/commit/4bb9c3f06f4da1c14ab24ad6a642bf831c90503f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20253 **[Test build #4042 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4042/testReport)** for PR 20253 at commit [`4bb9c3f`](https://github.com/apache/spark/commit/4bb9c3f06f4da1c14ab24ad6a642bf831c90503f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20153 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86072/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20253 In commit 4bb9c3f, there are no occurrences remaining of the string "processAllAvailable" in KafkaContinuousSourceSuite.scala, KafkaContinuousSinkSuite.scala, KafkaContinuousTest.scala, or StreamTest.scala. There are four occurrences in KafkaSourceSuite.scala, two within the MicroBatch suite and two matched to not happen in ContinuousExecution. (One test with a foreach sink was moved to the MicroBatch suite, because it was executing in microbatch mode anyway since we haven't updated foreach for continuous processing.) I believe this is exhaustive. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20153 **[Test build #86072 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86072/testReport)** for PR 20153 at commit [`4a6a725`](https://github.com/apache/spark/commit/4a6a725acffdc24f7c00302c1a0081c93f6acdd8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20253 **[Test build #86082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86082/testReport)** for PR 20253 at commit [`4bb9c3f`](https://github.com/apache/spark/commit/4bb9c3f06f4da1c14ab24ad6a642bf831c90503f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20253 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86071/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20253 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20253 **[Test build #86071 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86071/testReport)** for PR 20253 at commit [`71bfbcf`](https://github.com/apache/spark/commit/71bfbcfbca3b8bce064c790a92dbab59a9414934). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20253 The builds at commit 3fe76e3 were on an incomplete version of the PR, so their failures are expected. Of the 6 builds at target commit 0efc8c5, 5 passed and 1 failed. The failure was due to a stream.processAllAvailable() call in the Kafka suite; we already knew this method is inherently flake-prone for continuous processing, and had attempted to remove usages of it. I'm going to do another pass to get the rest, and then kick off 5 more attempts. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20248: [SPARK-23058][SQL] Show non printable field delim...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20248#discussion_r161364269 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -1023,7 +1023,12 @@ case class ShowCreateTableCommand(table: TableIdentifier) extends RunnableComman val serdeProps = metadata.storage.properties.map { case (key, value) => - s"'${escapeSingleQuotedString(key)}' = '${escapeSingleQuotedString(value)}'" + val escapedValue = if (value.length == 1 && (value.head < 32 || value.head > 126)) { --- End diff -- I need to copy an external table to another environment, but lost the create table statement. So I want to get this create table statement by `show create table ...`, but it can't show non printable field delim. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20246: [SPARK-23054][SQL] Fix incorrect results of casting User...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20246 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20246: [SPARK-23054][SQL] Fix incorrect results of casting User...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20246 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86070/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20246: [SPARK-23054][SQL] Fix incorrect results of casting User...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20246 **[Test build #86070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86070/testReport)** for PR 20246 at commit [`137d85f`](https://github.com/apache/spark/commit/137d85f23fa8d0e45144db89666f4c9083d14100). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20248: [SPARK-23058][SQL] Show non printable field delim as uni...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20248 **[Test build #86081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86081/testReport)** for PR 20248 at commit [`edf5fa6`](https://github.com/apache/spark/commit/edf5fa6e8ee29bf237a6d61dee1146f297bd570f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20254 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86068/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20254 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20254 **[Test build #86068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86068/testReport)** for PR 20254 at commit [`5562a16`](https://github.com/apache/spark/commit/5562a1665bebf413d5c4126642a77e2d9d0c4a46). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20163: [SPARK-22966][PYTHON][SQL] Python UDFs with retur...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20163#discussion_r161363813 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala --- @@ -144,6 +145,7 @@ object EvaluatePython { } case StringType => (obj: Any) => nullSafeConvert(obj) { + case _: Calendar => null case _ => UTF8String.fromString(obj.toString) --- End diff -- I think there is no perfect solution .. I think https://github.com/apache/spark/pull/20163#discussion_r161363004 sounds good enough as a fix for this issue for now .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20163: [SPARK-22966][PYTHON][SQL] Python UDFs with retur...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20163#discussion_r161363630 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala --- @@ -144,6 +145,7 @@ object EvaluatePython { } case StringType => (obj: Any) => nullSafeConvert(obj) { + case _: Calendar => null case _ => UTF8String.fromString(obj.toString) --- End diff -- BTW, seems there is another hole when we actually do the internal conversion with unexpected types: ```python >>> from pyspark.sql.functions import udf >>> f = udf(lambda x: x, "date") >>> spark.range(1).select(f("id")).show() ``` ``` org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "./python/pyspark/worker.py", line 229, in main process() File "./python/pyspark/worker.py", line 224, in process serializer.dump_stream(func(split_index, iterator), outfile) File "./python/pyspark/worker.py", line 149, in func = lambda _, it: map(mapper, it) File "", line 1, in File "./python/pyspark/worker.py", line 72, in return lambda *a: toInternal(f(*a)) File "/.../pyspark/sql/types.py", line 175, in toInternal return d.toordinal() - self.EPOCH_ORDINAL AttributeError: 'int' object has no attribute 'toordinal' ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20163: [SPARK-22966][PYTHON][SQL] Python UDFs with retur...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20163#discussion_r161363023 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala --- @@ -144,6 +145,7 @@ object EvaluatePython { } case StringType => (obj: Any) => nullSafeConvert(obj) { + case _: Calendar => null case _ => UTF8String.fromString(obj.toString) --- End diff -- Oh, I didn't see the comment above when I write my comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20163: [SPARK-22966][PYTHON][SQL] Python UDFs with retur...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20163#discussion_r161363004 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala --- @@ -144,6 +145,7 @@ object EvaluatePython { } case StringType => (obj: Any) => nullSafeConvert(obj) { + case _: Calendar => null case _ => UTF8String.fromString(obj.toString) --- End diff -- So, for now .. I think it's fine as a small fix as is ... We are going to document that the return type and return value should be matched anyway .. So, expected return values will be: ```python # Mapping Python types to Spark SQL DataType _type_mappings = { type(None): NullType, bool: BooleanType, int: LongType, float: DoubleType, str: StringType, bytearray: BinaryType, decimal.Decimal: DecimalType, datetime.date: DateType, datetime.datetime: TimestampType, datetime.time: TimestampType, } ``` Seems, we can also check if the string conversion looks reasonable and then blacklist `net.razorvine.pickle.objects.Time` if not ... How does this sound to you @cloud-fan and @rednaxelafx? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20163: [SPARK-22966][PYTHON][SQL] Python UDFs with retur...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20163#discussion_r161362994 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala --- @@ -144,6 +145,7 @@ object EvaluatePython { } case StringType => (obj: Any) => nullSafeConvert(obj) { + case _: Calendar => null case _ => UTF8String.fromString(obj.toString) --- End diff -- > check if the string conversion looks reasonably consistent by obj.toString. If not, we add it in the blacklist. hmm, this seems weird as the type mismatch now is defined by Pyrolite object's `toString` behavior... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20253 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86064/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20253 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20253 **[Test build #4037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4037/testReport)** for PR 20253 at commit [`3fe76e3`](https://github.com/apache/spark/commit/3fe76e30a01698ce8732044a0c663baa277605cb). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20253 **[Test build #86064 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86064/testReport)** for PR 20253 at commit [`5f4f7cf`](https://github.com/apache/spark/commit/5f4f7cf6662a389abe42bfcc433d2035c5d1c35e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20214: [SPARK-23023][SQL] Cast field data to strings in showStr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20214 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20214: [SPARK-23023][SQL] Cast field data to strings in showStr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20214 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86069/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20153 **[Test build #86080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86080/testReport)** for PR 20153 at commit [`d666110`](https://github.com/apache/spark/commit/d6661104f314c88ff84057fd4830e7a5fbe964d9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20214: [SPARK-23023][SQL] Cast field data to strings in showStr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20214 **[Test build #86069 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86069/testReport)** for PR 20214 at commit [`022ed32`](https://github.com/apache/spark/commit/022ed327bc7e2fd3a5cbd498d21183f0eabf2a26). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20163: [SPARK-22966][PYTHON][SQL] Python UDFs with retur...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20163#discussion_r161362902 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala --- @@ -144,6 +145,7 @@ object EvaluatePython { } case StringType => (obj: Any) => nullSafeConvert(obj) { + case _: Calendar => null case _ => UTF8String.fromString(obj.toString) --- End diff -- For the perfectness, I think we should check all the types, https://github.com/irmen/Pyrolite, ``` PYTHON> JAVA -- Nonenull boolboolean int int longlong or BigInteger (depending on size) string String unicode String complex net.razorvine.pickle.objects.ComplexNumber datetime.date java.util.Calendar datetime.datetime java.util.Calendar datetime.time net.razorvine.pickle.objects.Time datetime.timedelta net.razorvine.pickle.objects.TimeDelta float double (float isn't used) array.array array of appropriate primitive type (char, int, short, long, float, double) listjava.util.List tuple Object[] set java.util.Set dictjava.util.Map bytes byte[] bytearray byte[] decimal BigDecimal custom classMap(dict with class attributes including its name in "__class__") Pyro4.core.URI net.razorvine.pyro.PyroURI Pyro4.core.Proxynet.razorvine.pyro.PyroProxy Pyro4.errors.* net.razorvine.pyro.PyroException Pyro4.utils.flame.FlameBuiltin net.razorvine.pyro.FlameBuiltin Pyro4.utils.flame.FlameModule net.razorvine.pyro.FlameModule Pyro4.utils.flame.RemoteInteractiveConsole net.razorvine.pyro.FlameRemoteConsole ``` and then check if the string conversion looks reasonably consistent by `obj.toString`. If not, we add it in the blacklist. Another possibility is to whitelist `String`, but then I guess this is rather a radical change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20253 **[Test build #4038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4038/testReport)** for PR 20253 at commit [`0efc8c5`](https://github.com/apache/spark/commit/0efc8c5b7e98f3e79361f355a08fc8404d2d7d9b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20251: [Spark-23051][core] Fix for broken job description in Sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20251 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86066/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20251: [Spark-23051][core] Fix for broken job description in Sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20251 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20251: [Spark-23051][core] Fix for broken job description in Sp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20251 **[Test build #86066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86066/testReport)** for PR 20251 at commit [`d9cdb07`](https://github.com/apache/spark/commit/d9cdb07263f7a584cf217d30c55313283459ac92). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20253 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86063/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20253 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20253 **[Test build #86063 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86063/testReport)** for PR 20253 at commit [`3fe76e3`](https://github.com/apache/spark/commit/3fe76e30a01698ce8732044a0c663baa277605cb). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20254 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86067/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20254 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20254 **[Test build #86067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86067/testReport)** for PR 20254 at commit [`9fe5707`](https://github.com/apache/spark/commit/9fe57074b496ad95411c4ce5a43b0c43dd6246af). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorruptFiles` ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20240 **[Test build #86079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86079/testReport)** for PR 20240 at commit [`33ae3ca`](https://github.com/apache/spark/commit/33ae3ca34aa237c630927c96d9421ea53ed6a775). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20247: [SPARK-23021][SQL] AnalysisBarrier should override inner...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20247 **[Test build #86078 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86078/testReport)** for PR 20247 at commit [`7692099`](https://github.com/apache/spark/commit/7692099c42907682a5ca10fa6a800fcb1a6e745d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorruptFiles` ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20240 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20153: [SPARK-22392][SQL] data source v2 columnar batch ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20153#discussion_r161362565 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/sources/v2/JavaBatchDataSourceV2.java --- @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql.sources.v2; + +import java.io.IOException; +import java.util.List; + +import org.apache.spark.sql.execution.vectorized.OnHeapColumnVector; +import org.apache.spark.sql.sources.v2.DataSourceV2; +import org.apache.spark.sql.sources.v2.DataSourceV2Options; +import org.apache.spark.sql.sources.v2.ReadSupport; +import org.apache.spark.sql.sources.v2.reader.*; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.StructType; +import org.apache.spark.sql.vectorized.ColumnVector; +import org.apache.spark.sql.vectorized.ColumnarBatch; + + +public class JavaBatchDataSourceV2 implements DataSourceV2, ReadSupport { + + class Reader implements DataSourceV2Reader, SupportsScanColumnarBatch { --- End diff -- This is the convention. If we implement many mix-in interfaces, it's better to write ``` MyReader extends DataSourceV2Reader, XXX, YYY, ZZZ ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20247: [SPARK-23021][SQL] AnalysisBarrier should override inner...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20247 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20247: [SPARK-23021][SQL] AnalysisBarrier should override inner...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20247 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20247: [SPARK-23021][SQL] AnalysisBarrier should override inner...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20247 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86056/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20253 **[Test build #4041 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4041/testReport)** for PR 20253 at commit [`0efc8c5`](https://github.com/apache/spark/commit/0efc8c5b7e98f3e79361f355a08fc8404d2d7d9b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20247: [SPARK-23021][SQL] AnalysisBarrier should override inner...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20247 **[Test build #86056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86056/testReport)** for PR 20247 at commit [`7692099`](https://github.com/apache/spark/commit/7692099c42907682a5ca10fa6a800fcb1a6e745d). * This patch **fails from timeout after a configured wait of \`300m\`**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20253 **[Test build #4040 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4040/testReport)** for PR 20253 at commit [`0efc8c5`](https://github.com/apache/spark/commit/0efc8c5b7e98f3e79361f355a08fc8404d2d7d9b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20253 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20253 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86060/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20253: [SPARK-22908][SS] Roll forward continuous processing Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20253 **[Test build #86060 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86060/testReport)** for PR 20253 at commit [`f575483`](https://github.com/apache/spark/commit/f5754837efbdca10398b769be07eaf53ae36f0f3). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org