[GitHub] spark pull request #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19116#discussion_r136899937 --- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala --- @@ -19,7 +19,9 @@ package org.apache.spark.repl import java.io.BufferedReader +// scalastyle:off println import scala.Predef.{println => _, _} +// scalastyle:on println --- End diff -- I said it's weird because this obviously not a place to print out something. Not much harm actually. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19127: [SPARK-21916][SQL] Set isolationOn=true when create hive...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19127 **[Test build #81396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81396/testReport)** for PR 19127 at commit [`2d13ab8`](https://github.com/apache/spark/commit/2d13ab8a18955e281033c17a446022aba57865f8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19116#discussion_r136898192 --- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala --- @@ -19,7 +19,9 @@ package org.apache.spark.repl import java.io.BufferedReader +// scalastyle:off println import scala.Predef.{println => _, _} +// scalastyle:on println --- End diff -- This actually looks valid though. If I manually add ` import scala.Predef.{println => _, _}` somewhere not here, for example, `SQLConf` in the current master: ``` [error] .../spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:26:21: Are you sure you want to println? If yes, wrap the code block with [error] // scalastyle:off println [error] println(...) [error] // scalastyle:on println [error] .../spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:26:0: scala.Predef. is in wrong order relative to scala.collection.immutable. ``` It looks recognising this as an error. Looks 1.0.0 fixes an issue about those style checking and detection. We might have to fix `println` token checker rule but I guess this should be orthogonal anyway. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19127: [SPARK-21916][SQL] Set isolationOn=true when crea...
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/19127 [SPARK-21916][SQL] Set isolationOn=true when create hive client for metadata. ## What changes were proposed in this pull request? In current code, we set `isolationOn=!isCliSession()` when create hive client for metadata. However conf of `CliSessionState` points to local dummy metastore(https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L416). Using `CliSessionState`, we fail to get metadata from remote hive metastore. We can always set `isolationOn=true` when create hive clietnt for metadata ## How was this patch tested? Existing. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinxing64/spark SPARK-21916 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19127.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19127 commit 2d13ab8a18955e281033c17a446022aba57865f8 Author: jinxingDate: 2017-09-05T05:28:06Z Set isolationOn=true when create hive client for metadata. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19111 **[Test build #81395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81395/testReport)** for PR 19111 at commit [`5d156be`](https://github.com/apache/spark/commit/5d156be92fd3cfe8af30094fd759909ce5455d8f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19111 jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19116#discussion_r136894278 --- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala --- @@ -19,7 +19,9 @@ package org.apache.spark.repl import java.io.BufferedReader +// scalastyle:off println import scala.Predef.{println => _, _} +// scalastyle:on println --- End diff -- Nit: This looks a bit weird. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17014 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17014 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81393/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17014 **[Test build #81393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81393/testReport)** for PR 17014 at commit [`f8fa957`](https://github.com/apache/spark/commit/f8fa9573a1b40ff236e9c52cf429e2742c8f2bd0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19112: [SPARK-21901][SS] Define toString for StateOperat...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19112#discussion_r136892336 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -200,7 +202,7 @@ class SourceProgress protected[sql]( */ @InterfaceStability.Evolving class SinkProgress protected[sql]( -val description: String) extends Serializable { --- End diff -- not a committer but would like to leave this suggestion : - codestyle changes are orthogonal to the motive of the PR and should be done separately. Generally, every PR should address one problem and not have changes unrelated to it. In event of revert or bisecting commits to pin-point regression, following this practice helps a lot. - It would be beneficial to see why checkstyle does not catch such instances and fix that (along with making all such instances consistent with the rules). Otherwise this would be a one off fix and we would continue to pile up similar inconsistencies in future development without anyone realising this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19126: Model 1 and Model 2 ParamMaps Missing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19126 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19124 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19124 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81394/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19124 **[Test build #81394 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81394/testReport)** for PR 19124 at commit [`a738943`](https://github.com/apache/spark/commit/a73894374d284484d9b28123db02dfe6f264567a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19126: Model 1 and Model 2 ParamMaps Missing
GitHub user marktab opened a pull request: https://github.com/apache/spark/pull/19126 Model 1 and Model 2 ParamMaps Missing The original Scala code says println("Model 2 was fit using parameters: " + model2.parent.extractParamMap) The parent is lr There is no method for accessing parent as is done in Scala. This code has been tested in Python, and returns values consistent with Scala ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marktab/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19126.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19126 commit 76e5da7b14d71338cf82352e9cf5628640e732a2 Author: MarkTab marktab.netDate: 2017-09-05T03:26:07Z Model 1 and Model 2 ParamMaps Missing The original Scala code says println("Model 2 was fit using parameters: " + model2.parent.extractParamMap) The parent is lr There is no method for accessing parent as is done in Scala. This code has been tested in Python, and returns values consistent with Scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19121: [SPARK-21906][YARN][Spark Core]Don't runAsSparkUser to s...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19121 UGI is only used for security, normally it is used for Spark application to communicate with Hadoop using correct user. doAs already wraps the whole `CoarseGrainedExecutorBackend` process, all the task threads forked in this process will honor this UGI, don't need to wrap again on each task. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19121: [SPARK-21906][YARN][Spark Core]Don't runAsSparkUser to s...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/19121 @jerryshao 1. I didn't meet any problems, these codes are ok to run even if it is unnecessary. 2. In Standalone mode, if collaborating with a secured hdfs, we might haven't support yet. Besidesï¼this ugi `doAs` wraps executors' initialization but not tasks running, if we truly want to `doAs` a `SPARK_USER`, this ugi may be used in both phases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19125: [SPARK-21913][SQL][TEST] `withDatabase` should drop data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19125 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81392/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19125: [SPARK-21913][SQL][TEST] `withDatabase` should drop data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19125 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19125: [SPARK-21913][SQL][TEST] `withDatabase` should drop data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19125 **[Test build #81392 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81392/testReport)** for PR 19125 at commit [`241d565`](https://github.com/apache/spark/commit/241d56563ed278828567eb8f78029a8e70e96c5d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19124 **[Test build #81394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81394/testReport)** for PR 19124 at commit [`a738943`](https://github.com/apache/spark/commit/a73894374d284484d9b28123db02dfe6f264567a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19121: [SPARK-21906][YARN][Spark Core]Don't runAsSparkUser to s...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19121 Can you please elaborate the problem you met, did you meet any unexpected behavior? The changes here get rid of env variable "SPARK_USER", this might be OK for yarn application, but what if user runs on standalone mode and explicitly set this "SPARK_USER", your changes seems break the semantics. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17014 **[Test build #81393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81393/testReport)** for PR 17014 at commit [`f8fa957`](https://github.com/apache/spark/commit/f8fa9573a1b40ff236e9c52cf429e2742c8f2bd0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/17014 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13794: [SPARK-15574][ML][PySpark] Python meta-algorithms in Sca...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13794 +1 @jkbradley For now it is better to keep the current implementation for the 4 meta-algo in pyspark. @yinxusen Would you mind to close this PR ? But I still appreciate your contribution for this! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19124: [SPARK-21912][SQL] Creating ORC datasource table ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19124#discussion_r136878102 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala --- @@ -169,6 +171,16 @@ class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable } } } + + private def checkFieldName(name: String): Unit = { +// ,;{}()\n\t= and space are special characters in ORC schema --- End diff -- Thank you for review, @tejasapatil ! That's a good idea. Right, It's not an exhaustive list. I'll update the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19124: [SPARK-21912][SQL] Creating ORC datasource table ...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19124#discussion_r136877087 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala --- @@ -169,6 +171,16 @@ class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable } } } + + private def checkFieldName(name: String): Unit = { +// ,;{}()\n\t= and space are special characters in ORC schema --- End diff -- Is this exhaustive list ? eg. looks like `?` is not allowed either. Given that the underlying lib (ORC) can evolve to support / not support certain chars, its safer to reply on some method rather than coming up with a blacklist. Can you simply call `TypeInfoUtils.getTypeInfoFromTypeString` or any related method which would do this check ? ``` Caused by: java.lang.IllegalArgumentException: Error: : expected at the position 8 of 'struct' but '?' is found. at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:360) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:483) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfoFromTypeString(TypeInfoUtils.java:770) at org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:194) at org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:231) at org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:91) ... ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19125: [SPARK-21913][SQL][TEST] `withDatabase` should drop data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19125 **[Test build #81392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81392/testReport)** for PR 19125 at commit [`241d565`](https://github.com/apache/spark/commit/241d56563ed278828567eb8f78029a8e70e96c5d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19125: [SPARK-21913][SQL][TEST] `withDatabase` should dr...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/19125 [SPARK-21913][SQL][TEST] `withDatabase` should drop database with CASCADE ## What changes were proposed in this pull request? Currently, `withDatabase` fails if the database is not empty. It would be great if we drop cleanly with CASCADE. ## How was this patch tested? This is a change on test util. Pass the existing Jenkins. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-21913 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19125.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19125 commit 241d56563ed278828567eb8f78029a8e70e96c5d Author: Dongjoon HyunDate: 2017-09-04T23:23:37Z [SPARK-21913][SQL][TEST] `withDatabase` should drop database with CASCADE --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81390/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18692 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #81390 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81390/testReport)** for PR 18692 at commit [`cfeae46`](https://github.com/apache/spark/commit/cfeae46766a6ccb1b1a0113fe41cdb52b16897f3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19124 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19124 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81391/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19124 **[Test build #81391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81391/testReport)** for PR 19124 at commit [`808dfe0`](https://github.com/apache/spark/commit/808dfe0fcd9de2f43b33f0d1d084172b5624f2a8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19123: [SPARK-21418][SQL] NoSuchElementException: None.g...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19123 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19119: [SPARK-21845] [SQL] Make codegen fallback of expressions...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19119 Hi, @gatorsmile . Could you trigger Maven build, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19123: [SPARK-21418][SQL] NoSuchElementException: None.get in D...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/19123 LGTM, merging to master/2.2. Thanks for picking this up! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19123: [SPARK-21418][SQL] NoSuchElementException: None.get in D...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19123 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81388/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19123: [SPARK-21418][SQL] NoSuchElementException: None.get in D...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19123 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19124: [SPARK-21912][SQL] Creating ORC datasource table should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19124 **[Test build #81391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81391/testReport)** for PR 19124 at commit [`808dfe0`](https://github.com/apache/spark/commit/808dfe0fcd9de2f43b33f0d1d084172b5624f2a8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19123: [SPARK-21418][SQL] NoSuchElementException: None.get in D...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19123 **[Test build #81388 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81388/testReport)** for PR 19123 at commit [`735ca94`](https://github.com/apache/spark/commit/735ca949e042493632d297db23286a8f8f83a6ed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19124: [SPARK-21912][SQL] Creating ORC datasource table ...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/19124 [SPARK-21912][SQL] Creating ORC datasource table should check invalid column names ## What changes were proposed in this pull request? Currently, users meet job abortions while creating ORC data source tables with invalid column names. We had better prevent this by raising **AnalysisException** with a guide to use aliases instead like Paquet data source tables. **BEFORE** ```scala scala> sql("CREATE TABLE orc1 USING ORC AS SELECT 1 `a b`") 17/09/04 13:28:21 ERROR Utils: Aborting task java.lang.IllegalArgumentException: Error: : expected at the position 8 of 'struct' but ' ' is found. 17/09/04 13:28:21 ERROR FileFormatWriter: Job job_20170904132821_0001 aborted. 17/09/04 13:28:21 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) org.apache.spark.SparkException: Task failed while writing rows. ``` **AFTER** ```scala scala> sql("CREATE TABLE orc1 USING ORC AS SELECT 1 `a b`") 17/09/04 13:27:40 ERROR CreateDataSourceTableAsSelectCommand: Failed to write to table orc1 org.apache.spark.sql.AnalysisException: Attribute name "a b" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.; ``` ## How was this patch tested? Pass the Jenkins with a new test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-21912 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19124.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19124 commit 808dfe0fcd9de2f43b33f0d1d084172b5624f2a8 Author: Dongjoon HyunDate: 2017-09-04T20:46:15Z [SPARK-21912][SQL] Creating ORC datasource table should check invalid column names --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r136868330 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,71 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper { if (j.joinType == newJoinType) f else Filter(condition, j.copy(joinType = newJoinType)) } } + +/** + * A rule that uses propagated constraints to infer join conditions. The optimization is applicable + * only to CROSS joins. --- End diff -- Can you also mention the reason why we are restricting this to cross joins only ? ``` For other join types, adding inferred join conditions would potentially shuffle children as child node's partitioning won't satisfying the JOIN node's requirements which otherwise could have. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16774: [SPARK-19357][ML] Adding parallel model evaluatio...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/16774#discussion_r136868226 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -100,31 +113,53 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0") override val uid: String) val eval = $(evaluator) val epm = $(estimatorParamMaps) val numModels = epm.length -val metrics = new Array[Double](epm.length) + +// Create execution context based on $(parallelism) +val executionContext = getExecutionContext --- End diff -- In the corresponding PR for PySpark implementation the number of threads is limited by the number of models to be trained (https://github.com/WeichenXu123/spark/blob/be2f3d0ec50db4730c9e3f9a813a4eb96889f5b6/python/pyspark/ml/tuning.py#L261). We might do that for instance by overriding the `getParallelism` method. What do you think about this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18692 **[Test build #81390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81390/testReport)** for PR 18692 at commit [`cfeae46`](https://github.com/apache/spark/commit/cfeae46766a6ccb1b1a0113fe41cdb52b16897f3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19115: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/19115 And now I see that the title was changed to something more useful. Pardon any offense, the end result of the title changes look good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19115: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/19115 I realize this PR is now closed, but to follow-up on Saisai's request concerning PR titles, I'll also note that the title of this PR isn't very useful even after the JIRA id and component tag are added. Titles like "fixed foo" or "updated bar" don't really tell reviewers or those looking at the commit logs in the future what the PR is about. The JIRA should tell us _why_ a change or addition is needed, the description in the PR should tell us _what_ was changed or added, and the PR title should give us enough of an idea of what is going on that we don't necessarily have to open the PR or look at the code changes just to see whether it is something that we are even at all interested in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19111 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81389/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19111 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19111 **[Test build #81389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81389/testReport)** for PR 19111 at commit [`5d156be`](https://github.com/apache/spark/commit/5d156be92fd3cfe8af30094fd759909ce5455d8f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/18975 @gatorsmile : Yes. Hive is not 100% atomic as stuff can go wrong between removing old data and renaming staging location. But its superior in these regards: - Hive would output "no data" OR "complete data". Here we can have "no data" OR "incomplete data" OR "complete data". The "incomplete data" part worries me. Staging dir helps achieving "you either see nothing OR everything" behaviour. - The window of "you see nothing" is much bigger here compared to Hive as the output location is cleaned up before execution. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19111 **[Test build #81389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81389/testReport)** for PR 19111 at commit [`5d156be`](https://github.com/apache/spark/commit/5d156be92fd3cfe8af30094fd759909ce5455d8f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19111 jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19111 + @shivaram could you do a quick review? given this change I'd love to get some feedback --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19111 Yes, the issue is with random sampling, and this PR should fix all of these. I'm not sure why I haven't seen them much before - they have been around for years - appreciate bringing these up, we should track them with JIRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19111 jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19123: [SPARK-21418][SQL] NoSuchElementException: None.get in D...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19123 **[Test build #81388 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81388/testReport)** for PR 19123 at commit [`735ca94`](https://github.com/apache/spark/commit/735ca949e042493632d297db23286a8f8f83a6ed). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19123: [SPARK-21418][SQL] NoSuchElementException: None.g...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/19123 [SPARK-21418][SQL] NoSuchElementException: None.get in DataSourceScanExec with sun.io.serialization.extendedDebugInfo=true ## What changes were proposed in this pull request? If no SparkConf is available to Utils.redact, simply don't redact. ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-21418 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19123.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19123 commit 735ca949e042493632d297db23286a8f8f83a6ed Author: Sean OwenDate: 2017-09-04T17:32:00Z Don't fail with NPE in corner case where Utils.redact happens outside active session --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19122 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19122 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81387/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19122 **[Test build #81387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81387/testReport)** for PR 19122 at commit [`be2f3d0`](https://github.com/apache/spark/commit/be2f3d0ec50db4730c9e3f9a813a4eb96889f5b6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19108 cc @yanboliang Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19122 **[Test build #81387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81387/testReport)** for PR 19122 at commit [`be2f3d0`](https://github.com/apache/spark/commit/be2f3d0ec50db4730c9e3f9a813a4eb96889f5b6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19122 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81386/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19122 **[Test build #81386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81386/testReport)** for PR 19122 at commit [`57cf534`](https://github.com/apache/spark/commit/57cf53473e5bfb75095b0e519457dbdc973f3300). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class HasParallelism(Params):` * `class CrossValidator(Estimator, ValidatorParams, HasParallelism, MLReadable, MLWritable):` * `class TrainValidationSplit(Estimator, ValidatorParams, HasParallelism, MLReadable, MLWritable):` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19122 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19122#discussion_r136850665 --- Diff: python/pyspark/ml/tuning.py --- @@ -255,18 +257,23 @@ def _fit(self, dataset): randCol = self.uid + "_rand" df = dataset.select("*", rand(seed).alias(randCol)) metrics = [0.0] * numModels + +pool = ThreadPool(processes=min(self.getParallelism(), numModels)) + for i in range(nFolds): validateLB = i * h validateUB = (i + 1) * h condition = (df[randCol] >= validateLB) & (df[randCol] < validateUB) -validation = df.filter(condition) +validation = df.filter(condition).cache() --- End diff -- Here maybe need a discussion. Currently in pyspark it both do not cache `train dataset` and `validation dataset` but in scala impl it cache both of them. But I prefer cache `validation dataset` but do not cache `train dataset`, because the size of `validation dataset` is only `1/numFolds` of input dataset, it deserve caching otherwise it will scan input dataset again. But the size `train dataset` is `(numFolds - 1)/numFolds` of input dataset. We can directly scan from input dataset to generate the `train dataset` and won't slow down too much. @BryanCutler @MLnick What do you think about it ? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19122 **[Test build #81386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81386/testReport)** for PR 19122 at commit [`57cf534`](https://github.com/apache/spark/commit/57cf53473e5bfb75095b0e519457dbdc973f3300). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19122 [SPARK-21911][ML][PySpark] Parallel Model Evaluation for ML Tuning in PySpark ## What changes were proposed in this pull request? Add parallelism support for ML tuning in pyspark. ## How was this patch tested? Test updated. You can merge this pull request into a Git repository by running: $ git pull https://github.com/WeichenXu123/spark par-ml-tuning-py Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19122.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19122 commit 57cf53473e5bfb75095b0e519457dbdc973f3300 Author: WeichenXuDate: 2017-09-04T16:03:55Z init pr --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16611 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16611 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81385/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16611 **[Test build #81385 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81385/testReport)** for PR 16611 at commit [`4c1a012`](https://github.com/apache/spark/commit/4c1a012e5cad648e81797ec494f44392189560ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19117: [SPARK-21904] [SQL] Rename tempTables to tempViews in Se...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19117 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19117: [SPARK-21904] [SQL] Rename tempTables to tempViews in Se...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19117 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81384/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19117: [SPARK-21904] [SQL] Rename tempTables to tempViews in Se...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19117 **[Test build #81384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81384/testReport)** for PR 19117 at commit [`02815e7`](https://github.com/apache/spark/commit/02815e7faae23a32b04c7af08c826f4428c60f5c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19119: [SPARK-21845] [SQL] Make codegen fallback of expressions...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19119 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81383/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19119: [SPARK-21845] [SQL] Make codegen fallback of expressions...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19119 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19119: [SPARK-21845] [SQL] Make codegen fallback of expressions...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19119 **[Test build #81383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81383/testReport)** for PR 19119 at commit [`b96da49`](https://github.com/apache/spark/commit/b96da49aa0893f8bf34da2a2c111499fdbad7b5a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18875 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81382/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18875 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18875 **[Test build #81382 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81382/testReport)** for PR 18875 at commit [`3ebbe67`](https://github.com/apache/spark/commit/3ebbe67e059dfb6a004ff50f3c661f6319d616b8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16611 **[Test build #81385 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81385/testReport)** for PR 16611 at commit [`4c1a012`](https://github.com/apache/spark/commit/4c1a012e5cad648e81797ec494f44392189560ce). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19117: [SPARK-21904] [SQL] Rename tempTables to tempViews in Se...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19117 **[Test build #81384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81384/testReport)** for PR 19117 at commit [`02815e7`](https://github.com/apache/spark/commit/02815e7faae23a32b04c7af08c826f4428c60f5c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19117: [SPARK-21904] [SQL] Rename tempTables to tempViews in Se...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19117 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19113 With 2.7G data, I ran a simple Java problem with 2.5.4 and 2.2.1 with `CsvParser`, and simple e2e read tests. Elapsed time diff was roughly -1.7% ~ +1.2%. I think virtually no diff (or 0.5 improvement). I think we generally trust other communities and libraries we decided to add such as ORC, Parquet, Jackson and etc., and de-duplicate such efforts with the community support. I think we discussed about this before. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19110 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81381/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19110 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19110 **[Test build #81381 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81381/testReport)** for PR 19110 at commit [`fc6fd5e`](https://github.com/apache/spark/commit/fc6fd5e98edcaccc4e42abf8ba94250ea1dbdfba). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18875 **[Test build #81382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81382/testReport)** for PR 18875 at commit [`3ebbe67`](https://github.com/apache/spark/commit/3ebbe67e059dfb6a004ff50f3c661f6319d616b8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19110 **[Test build #81381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81381/testReport)** for PR 19110 at commit [`fc6fd5e`](https://github.com/apache/spark/commit/fc6fd5e98edcaccc4e42abf8ba94250ea1dbdfba). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19119: [SPARK-21845] [SQL] Make codegen fallback of expressions...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19119 **[Test build #81383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81383/testReport)** for PR 19119 at commit [`b96da49`](https://github.com/apache/spark/commit/b96da49aa0893f8bf34da2a2c111499fdbad7b5a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18875 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18875 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81380/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18875 **[Test build #81380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81380/testReport)** for PR 18875 at commit [`d466524`](https://github.com/apache/spark/commit/d466524e918361891ef406e4fe9d9b3b638054c3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19116 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81378/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19116 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18576 @gatorsmile I think a bit more about this issue and I propose another approach; how about just moving `output` into `QueryPlanConstraints` and `output` always considering NULL constraints in its own logical plan? This does not solve all the existing issues about nullability though, this fix is not intrusive but simple (I feel good as a first step for this). https://github.com/apache/spark/compare/master...maropu:SPARK-21351-4#diff-b40fcb6ac9b2e94b410f39a94a97e822R36 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19116 **[Test build #81378 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81378/testReport)** for PR 19116 at commit [`2447fd0`](https://github.com/apache/spark/commit/2447fd0e152ace4dd074a92bd0d3cdc638b09b1a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org