[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19779 **[Test build #83984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83984/testReport)** for PR 19779 at commit [`034b246`](https://github.com/apache/spark/commit/034b2466d073c008b71eae072ee98353df56cbf2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...
GitHub user vinodkc opened a pull request: https://github.com/apache/spark/pull/19779 [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support writing to Hive table which uses Avro schema url 'avro.schema.url' ## What changes were proposed in this pull request? Support writing to Hive table which uses Avro schema url 'avro.schema.url' For ex: create external table avro_in (a string) stored as avro location '/avro-in/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc'); create external table avro_out (a string) stored as avro location '/avro-out/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc'); insert overwrite table avro_out select * from avro_in; // fails with java.lang.NullPointerException WARN AvroSerDe: Encountered exception determining schema. Returning signal schema to indicate problem java.lang.NullPointerException at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:182) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174) ## Changes proposed in this fix Currently 'null' value is passed to serializer, which causes NPE during insert operation, instead pass Hadoop configuration object ## How was this patch tested? Added new test case in VersionsSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/vinodkc/spark br_Fix_SPARK-17920 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19779.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19779 commit 034b2466d073c008b71eae072ee98353df56cbf2 Author: vinodkcDate: 2017-11-18T07:52:59Z pass hadoopConfiguration to Serializer --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19779 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19779 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83984/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19779 **[Test build #83984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83984/testReport)** for PR 19779 at commit [`034b246`](https://github.com/apache/spark/commit/034b2466d073c008b71eae072ee98353df56cbf2). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19779 **[Test build #83985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83985/testReport)** for PR 19779 at commit [`a59bd09`](https://github.com/apache/spark/commit/a59bd093878cf7060781ad0628176ff9b3df63a1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19779 **[Test build #83985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83985/testReport)** for PR 19779 at commit [`a59bd09`](https://github.com/apache/spark/commit/a59bd093878cf7060781ad0628176ff9b3df63a1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19779 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19779 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83985/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/19774#discussion_r151838625 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -689,6 +689,11 @@ case class DescribeColumnCommand( buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL")) buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL")) buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL")) + buffer ++= cs.flatMap(_.histogram.map { hist => +val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}") +Seq(header) ++ hist.bins.map(bin => + Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}")) --- End diff -- @wzhfy I'd rather define a `val` with the comment being the name of the val. That would make it "compile-safe". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/19627 What happens when you run `check-license` locally? I agree it doesn't look like any of these changes would impact the license headers. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18457: [SPARK-21241][MLlib]- Add setIntercept to StreamingLinea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18457 **[Test build #83986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83986/testReport)** for PR 18457 at commit [`544b4d0`](https://github.com/apache/spark/commit/544b4d0e0691ea2912cf214a8c296171d8fc2d2b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18277 What do you think @HyukjinKwon ? I think this is probably a reasonable fix, but we might break some peoples code who have been depending on the bug. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/19643#discussion_r151839304 --- Diff: python/pyspark/context.py --- @@ -860,6 +860,23 @@ def addPyFile(self, path): import importlib importlib.invalidate_caches() +def addJar(self, path, addToCurrentClassLoader=False): +""" +Adds a JAR dependency for Spark tasks to be executed in the future. +The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported +filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +If `addToCurrentClassLoader` is true, add the jar to the current threads' class loader +in the backing JVM. In general adding to the current threads' class loader will impact all +other application threads unless they have explicitly changed their class loader. --- End diff -- So we currently use `.. note:: DeveloperApi` to indicate it's a developer API (see ml/pipeline and friends for an example). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18457: [SPARK-21241][MLlib]- Add setIntercept to StreamingLinea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18457 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18457: [SPARK-21241][MLlib]- Add setIntercept to StreamingLinea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18457 **[Test build #83986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83986/testReport)** for PR 18457 at commit [`544b4d0`](https://github.com/apache/spark/commit/544b4d0e0691ea2912cf214a8c296171d8fc2d2b). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18457: [SPARK-21241][MLlib]- Add setIntercept to StreamingLinea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18457 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83986/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18906 So I think with the performance improvements coming into Python UDFs considering annotating results as nullable or not could make sense (although I imagine we'd need to do something differeent for the vectorized UDFs if they aren't already being done). Let's loop in @BryanCutler , but the I think the performance improvements could be reasonable to be thinking about in Spark 2.3+. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18339 Jenkins OK to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/19407#discussion_r151838606 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -267,11 +267,12 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { useTempCheckpointLocation = true, trigger = trigger) } else { - val (useTempCheckpointLocation, recoverFromCheckpointLocation) = + val recoverFromCheckpointLocation = true + val useTempCheckpointLocation = if (source == "console") { - (true, true) + true } else { - (false, true) + false --- End diff -- Do we really need it anymore since the `if` expression is just `source == "console"`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18457: [SPARK-21241][MLlib]- Add setIntercept to StreamingLinea...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18457 err Jenkins test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15670: [SPARK-18161] [Python] Allow pickle to serialize >4 GB o...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15670 Would you be ok with someone taking over this PR if your busy? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18457: [SPARK-21241][MLlib]- Add setIntercept to StreamingLinea...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18457 Jenkins, test this plase. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to av...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/19498#discussion_r151839513 --- Diff: python/pyspark/streaming/util.py --- @@ -64,7 +64,11 @@ def call(self, milliseconds, jrdds): t = datetime.fromtimestamp(milliseconds / 1000.0) r = self.func(t, *rdds) if r: -return r._jrdd +# Here, we work around to ensure `_jrdd` is `JavaRDD` by wrapping it by `map`. +# org.apache.spark.streaming.api.python.PythonTransformFunction requires to return +# `JavaRDD`; however, this could be `JavaPairRDD` by some APIs, for example, `zip`. +# See SPARK-17756. +return r.map(lambda x: x)._jrdd --- End diff -- Personally, I think the only applying the `map` when the result is not JavaRDD is a good incremental improvement (since otherwise the code path fails right?). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18982 Can we update this to master? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19780 **[Test build #83987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83987/testReport)** for PR 19780 at commit [`e08259a`](https://github.com/apache/spark/commit/e08259a41d0f39c751858daba713b30e52a6c3a4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19780 **[Test build #83988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83988/testReport)** for PR 19780 at commit [`dc49b6e`](https://github.com/apache/spark/commit/dc49b6e1c884bce164e08bb3f63cbdec86541c75). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb com...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/19780 [SPARK-22551][SQL][WIP] Prevent possible 64kb compile error for common expression types ## What changes were proposed in this pull request? For common expression types, such as BinaryExpression and TernaryExpression, the combination of generated codes of children can possibly be large. We should put the codes into functions to prevent possible 64kb compile error. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-22551 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19780.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19780 commit e08259a41d0f39c751858daba713b30e52a6c3a4 Author: Liang-Chi HsiehDate: 2017-11-18T15:11:05Z Put large generated codes of children expressions into functions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17972: [SPARK-20723][ML]Add intermediate storage level to tree ...
Github user phatak-dev commented on the issue: https://github.com/apache/spark/pull/17972 @WeichenXu123 resolved merge conflicts. Can you initiate jenkins build? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19780 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19780 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83987/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19082 ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19781 **[Test build #83990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83990/testReport)** for PR 19781 at commit [`9797041`](https://github.com/apache/spark/commit/9797041aa9138386f26d1f6c259da302f918ab5d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19767#discussion_r151852231 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -64,52 +64,22 @@ case class If(predicate: Expression, trueValue: Expression, falseValue: Expressi val trueEval = trueValue.genCode(ctx) val falseEval = falseValue.genCode(ctx) -// place generated code of condition, true value and false value in separate methods if -// their code combined is large -val combinedLength = condEval.code.length + trueEval.code.length + falseEval.code.length --- End diff -- Actually I think this removed part is orthogonal to what this PR did. Even condition, true, and false expressions are not more than threshold individually, their combination is still more than the threshold. This PR deals the oversize gen'd codes in deeply nested expressions, not oversize combination of codes from the children. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19780 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83988/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem w...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19728 thanks, merging to master/2.2! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19781 @cloud-fan WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17436 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17436 **[Test build #83989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83989/testReport)** for PR 17436 at commit [`b2c5b2e`](https://github.com/apache/spark/commit/b2c5b2ef0a36a2cc4085856970ddad490e526924). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17436 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83989/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19780 **[Test build #83988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83988/testReport)** for PR 19780 at commit [`dc49b6e`](https://github.com/apache/spark/commit/dc49b6e1c884bce164e08bb3f63cbdec86541c75). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19780 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17436 **[Test build #83989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83989/testReport)** for PR 17436 at commit [`b2c5b2e`](https://github.com/apache/spark/commit/b2c5b2ef0a36a2cc4085856970ddad490e526924). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's ...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/19781 [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCopyResult in Sort, HashAggregate, and BroadcastHashJoin ## What changes were proposed in this pull request? I found #19656 causes some bugs, for example, it changed the result set of `q6` in tpcds: - w/o pr19658 ``` +-+---+ |state|cnt| +-+---+ | MA| 10| | AK| 10| | AZ| 11| | ME| 13| | VT| 14| | NV| 15| | NH| 16| | UT| 17| | NJ| 21| | MD| 22| | WY| 25| | NM| 26| | OR| 31| | WA| 36| | ND| 38| | ID| 39| | SC| 45| | WV| 50| | FL| 51| | OK| 53| | MT| 53| | CO| 57| | AR| 58| | NY| 58| | PA| 62| | AL| 63| | LA| 63| | SD| 70| | WI| 80| | null| 81| | MI| 82| | NC| 82| | MS| 83| | CA| 84| | MN| 85| | MO| 88| | IL| 95| | IA|102| | TN|102| | IN|103| | KY|104| | NE|113| | OH|114| | VA|130| | KS|139| | GA|168| | TX|216| +-+---+ ``` - w/ pr19658 ``` +-+---+ |state|cnt| +-+---+ | RI| 14| | AK| 16| | FL| 20| | NJ| 21| | NM| 21| | NV| 22| | MA| 22| | MD| 22| | UT| 22| | AZ| 25| | SC| 28| | AL| 36| | MT| 36| | WA| 39| | ND| 41| | MI| 44| | AR| 45| | OR| 47| | OK| 52| | PA| 53| | LA| 55| | CO| 55| | NY| 64| | WV| 66| | SD| 72| | MS| 73| | NC| 79| | IN| 82| | null| 85| | ID| 88| | MN| 91| | WI| 95| | IL| 96| | MO| 97| | CA|109| | CA|109| | TN|114| | NE|115| | KY|128| | OH|131| | IA|156| | TX|160| | VA|182| | KS|211| | GA|230| +-+---+ ``` This pr is to keep the original logic of `CodegenContext.copyResult` in some plans. ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-22445-bugfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19781.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19781 commit 9797041aa9138386f26d1f6c259da302f918ab5d Author: Takeshi YamamuroDate: 2017-11-19T00:12:46Z bugfix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19627 **[Test build #83992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83992/testReport)** for PR 19627 at commit [`ae082f5`](https://github.com/apache/spark/commit/ae082f564ff2c23c976201ccf91a7dcd6726e4c9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19728 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19781 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83990/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19781 **[Test build #83990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83990/testReport)** for PR 19781 at commit [`9797041`](https://github.com/apache/spark/commit/9797041aa9138386f26d1f6c259da302f918ab5d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19781 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19767: [SPARK-22543][SQL] fix java 64kb compile error for deepl...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19767 should this go to 2.2? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19627 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83991/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19627 **[Test build #83991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83991/testReport)** for PR 19627 at commit [`758bc24`](https://github.com/apache/spark/commit/758bc24e8328e8b496b28c7cd8b8183458f18953). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19627 **[Test build #83991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83991/testReport)** for PR 19627 at commit [`758bc24`](https://github.com/apache/spark/commit/758bc24e8328e8b496b28c7cd8b8183458f18953). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19627 @holdenk Find the reason. There is an empty file in the directory. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19627 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19782: [SPARK-22554][PYTHON] Add a config to control if ...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/19782 [SPARK-22554][PYTHON] Add a config to control if PySpark should use daemon or not ## What changes were proposed in this pull request? This PR proposes to add a flag to control if PySpark should use daemon or not. Actually, SparkR already has a flag for useDaemon: https://github.com/apache/spark/blob/478fbc866fbfdb4439788583281863ecea14e8af/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L362 It'd be great if we have this flag too. It makes easier to test Windows specific issue. ## How was this patch tested? Manually tested. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark use-daemon-flag Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19782.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19782 commit f41698e330c517830a90309a022b072ea6406dcb Author: hyukjinkwonDate: 2017-11-19T05:10:19Z Add a config to control if PySpark should use daemon or not --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19782: [SPARK-22554][PYTHON] Add a config to control if PySpark...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19782 This is also partly for running Python coverage without extra code change. I know a hacky way to run this (see https://github.com/apache/spark/pull/19630#issuecomment-345490662 and https://github.com/apache/spark/pull/19630#issuecomment-345171997): Now, we can do, for example, as below: ``` pip install coverage # Build Spark (http://spark.apache.org/docs/latest/building-spark.html) rm python/lib/pyspark.zip rm -fr .coverage rm -fr coverage_html echo "spark.python.use.daemon false" >> conf/spark-defaults.conf echo " #!/usr/bin/env bash coverage run -p \$@ " > coverage_python chmod 755 coverage_python # Run actual Python tests PATH=`pwd`:$PATH PYSPARK_PYTHON=coverage_python SPARK_TESTING=1 bin/pyspark pyspark.sql.tests VectorizedUDFTests rm conf/spark-defaults.conf coverage combine coverage html -d coverage_html -i open coverage_html # Open up index.html in your browser. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19782: [SPARK-22554][PYTHON] Add a config to control if PySpark...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19782 cc @ueshin, could you take a look please when you have some time? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19498 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83995/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19498 **[Test build #83995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83995/testReport)** for PR 19498 at commit [`c45b701`](https://github.com/apache/spark/commit/c45b7016d8c446d023a3ca415c15d26298e61c5a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18339 **[Test build #83998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83998/testReport)** for PR 18339 at commit [`3ece21f`](https://github.com/apache/spark/commit/3ece21f5fd99e12a34616ffe90e34025ea3e3ee7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18277 it seems okay without a close look. Let me take the close look if I can take the look first soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19498 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83996/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19498 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19715 **[Test build #83994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83994/testReport)** for PR 19715 at commit [`5038e21`](https://github.com/apache/spark/commit/5038e21e9f3d0c80f71308f2fc9167e4a7749e82). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19715 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19498 **[Test build #83996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83996/testReport)** for PR 19498 at commit [`dc8446a`](https://github.com/apache/spark/commit/dc8446ad99b2ad315ee93f854d98e3c25aa42ccf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19715 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83994/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19627 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83992/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19627 **[Test build #83992 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83992/testReport)** for PR 19627 at commit [`ae082f5`](https://github.com/apache/spark/commit/ae082f564ff2c23c976201ccf91a7dcd6726e4c9). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class CrossValidator(Estimator, ValidatorParams, HasParallelism, HasCollectSubModels,` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19627 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19782: [SPARK-22554][PYTHON] Add a config to control if PySpark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19782 **[Test build #83993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83993/testReport)** for PR 19782 at commit [`f41698e`](https://github.com/apache/spark/commit/f41698e330c517830a90309a022b072ea6406dcb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19498 **[Test build #83995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83995/testReport)** for PR 19498 at commit [`c45b701`](https://github.com/apache/spark/commit/c45b7016d8c446d023a3ca415c15d26298e61c5a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19498 **[Test build #83997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83997/testReport)** for PR 19498 at commit [`174ec21`](https://github.com/apache/spark/commit/174ec2139a7e0af049e2954494525fd3fff145e2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19683#discussion_r151855849 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala --- @@ -59,15 +61,23 @@ case class GenerateExec( generator: Generator, join: Boolean, outer: Boolean, +omitGeneratorChild: Boolean, generatorOutput: Seq[Attribute], child: SparkPlan) extends UnaryExecNode with CodegenSupport { + private def projectedChildOutput = generator match { +case g: UnaryExpression if omitGeneratorChild => + (child.output diff Seq(g.child)) +case _ => + child.output + } + override def output: Seq[Attribute] = { if (join) { - child.output ++ generatorOutput -} else { - generatorOutput + projectedChildOutput ++ generatorOutput + } else { --- End diff -- nit: do we need update indentation? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19370 Will take a final look tomorrow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r151854676 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") ( @Since("2.0.0") object DenseMatrix { + @Since("2.3.0") + private[ml] def unapply(dm: DenseMatrix): Option[(Int, Int, Array[Double], Boolean)] = --- End diff -- @yanboliang any suggestion? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19630: [SPARK-22409] Introduce function type argument in pandas...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19630 Actually, R has a flag for `useDaemon`: https://github.com/apache/spark/blob/478fbc866fbfdb4439788583281863ecea14e8af/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L362 It'd be great if we have this flag too. It makes easier to test Windows specific issue too .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19715 **[Test build #83994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83994/testReport)** for PR 19715 at commit [`5038e21`](https://github.com/apache/spark/commit/5038e21e9f3d0c80f71308f2fc9167e4a7749e82). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19498 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19774#discussion_r151855965 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -689,6 +689,11 @@ case class DescribeColumnCommand( buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL")) buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL")) buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL")) + buffer ++= cs.flatMap(_.histogram.map { hist => +val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}") +Seq(header) ++ hist.bins.map(bin => + Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}")) + }).getOrElse(Seq(Row("histogram", "NULL"))) --- End diff -- Some comments or cleanup here would be nicer though --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19498 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83997/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19498 **[Test build #83997 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83997/testReport)** for PR 19498 at commit [`174ec21`](https://github.com/apache/spark/commit/174ec2139a7e0af049e2954494525fd3fff145e2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19752 Sure, it may have some overlaps with #18641. I will review this after #18641 due to avoiding a conflict. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18339 I am okay with going ahead @holdenk if you think it's okay anyway. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18339 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19630: [SPARK-22409] Introduce function type argument in pandas...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19630 OK, mine was, with this diff: ```diff --- a/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala @@ -38,7 +38,7 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String // (pyspark/daemon.py) and tell it to fork new workers for our tasks. This daemon currently // only works on UNIX-based systems now because it uses signals for child management, so we can // also fall back to launching workers (pyspark/worker.py) directly. - val useDaemon = !System.getProperty("os.name").startsWith("Windows") + val useDaemon = false var daemon: Process = null val daemonHost = InetAddress.getByAddress(Array(127, 0, 0, 1)) ``` ```bash pip install coverage # Build Spark (http://spark.apache.org/docs/latest/building-spark.html) rm python/lib/pyspark.zip rm -fr .coverage rm -fr coverage_html echo " #!/usr/bin/env bash coverage run -p \$@ " > coverage_python chmod 755 coverage_python PATH=`pwd`:$PATH PYSPARK_PYTHON=coverage_python SPARK_TESTING=1 bin/pyspark pyspark.sql.tests VectorizedUDFTests coverage combine coverage html -d coverage_html -i open coverage_html # Open up index.html in your browser. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to av...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19498#discussion_r151855253 --- Diff: python/pyspark/streaming/util.py --- @@ -64,7 +64,11 @@ def call(self, milliseconds, jrdds): t = datetime.fromtimestamp(milliseconds / 1000.0) r = self.func(t, *rdds) if r: -return r._jrdd +# Here, we work around to ensure `_jrdd` is `JavaRDD` by wrapping it by `map`. +# org.apache.spark.streaming.api.python.PythonTransformFunction requires to return +# `JavaRDD`; however, this could be `JavaPairRDD` by some APIs, for example, `zip`. +# See SPARK-17756. +return r.map(lambda x: x)._jrdd --- End diff -- Thanks for review @holdenk. Let me push the change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19498 **[Test build #83996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83996/testReport)** for PR 19498 at commit [`dc8446a`](https://github.com/apache/spark/commit/dc8446ad99b2ad315ee93f854d98e3c25aa42ccf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19498 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18339 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18339 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83998/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18339 **[Test build #83998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83998/testReport)** for PR 18339 at commit [`3ece21f`](https://github.com/apache/spark/commit/3ece21f5fd99e12a34616ffe90e34025ea3e3ee7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19767#discussion_r151842921 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -105,6 +105,41 @@ abstract class Expression extends TreeNode[Expression] { val isNull = ctx.freshName("isNull") val value = ctx.freshName("value") val ve = doGenCode(ctx, ExprCode("", isNull, value)) + + // TODO: support whole stage codegen too + if (ve.code.trim.length > 1024 && ctx.INPUT_ROW != null && ctx.currentVars == null) { +val setIsNull = if (ve.isNull != "false" && ve.isNull != "true") { + val globalIsNull = ctx.freshName("globalIsNull") + ctx.addMutableState("boolean", globalIsNull, s"$globalIsNull = false;") + val localIsNull = ve.isNull + ve.isNull = globalIsNull + s"$globalIsNull = $localIsNull;" +} else { + "" +} + +val setValue = { + val globalValue = ctx.freshName("globalValue") + ctx.addMutableState( +ctx.javaType(dataType), globalValue, s"$globalValue = ${ctx.defaultValue(dataType)};") + val localValue = ve.value + ve.value = globalValue + s"$globalValue = $localValue;" +} + +val funcName = ctx.freshName(nodeName) +val funcFullName = ctx.addNewFunction(funcName, + s""" + |private void $funcName(InternalRow ${ctx.INPUT_ROW}) { + | ${ve.code.trim} + | $setValue --- End diff -- creating objects will be a big overhead. I think having a global boolean variable is better. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19780 **[Test build #83987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83987/testReport)** for PR 19780 at commit [`e08259a`](https://github.com/apache/spark/commit/e08259a41d0f39c751858daba713b30e52a6c3a4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org