[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18810 **[Test build #80325 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80325/testReport)** for PR 18810 at commit [`7e84753`](https://github.com/apache/spark/commit/7e84753ca9befc8f3cea872250b2145e132ac837). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17357#discussion_r131582810 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala --- @@ -139,7 +139,9 @@ private[rest] class StandaloneSubmitRequestServlet( val driverExtraLibraryPath = sparkProperties.get("spark.driver.extraLibraryPath") val superviseDriver = sparkProperties.get("spark.driver.supervise") val appArgs = request.appArgs -val environmentVariables = request.environmentVariables +// Filter SPARK_LOCAL environment variables from being set on the remote system. +val environmentVariables = + request.environmentVariables.filterNot(_._1.startsWith("SPARK_LOCAL")) --- End diff -- I guess that Driver might not use `SPARK_LOCAL_DIRS`. But yes we may only need to filter out `SPARK_LOCAL_IP` and `SPARK_LOCAL_HOSTNAME`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18866: [SPARK-21649][SQL] Support writing data into hive...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18866#discussion_r131582440 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -262,7 +262,12 @@ case class HashPartitioning(expressions: Seq[Expression], numPartitions: Int) * Returns an expression that will produce a valid partition ID(i.e. non-negative and is less * than numPartitions) based on hashing expressions. */ - def partitionIdExpression: Expression = Pmod(new Murmur3Hash(expressions), Literal(numPartitions)) + def partitionIdExpression(useHiveHash: Boolean = false): Expression = +if (useHiveHash) { + Pmod(new HiveHash(expressions), Literal(numPartitions)) --- End diff -- I saw that `HiveHash simulates Hive's hashing function from Hive v1.2.1...`. Is there any compatibility issue for Hive before 1.2.1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18801: SPARK-10878 Fix race condition when multiple clients res...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18801 **[Test build #80324 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80324/testReport)** for PR 18801 at commit [`1ace5cc`](https://github.com/apache/spark/commit/1ace5cc8232536bcc336042aec686fed1204f799). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18801: SPARK-10878 Fix race condition when multiple clients res...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18801 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18846: [SPARK-21642][CORE] Use FQDN for DRIVER_HOST_ADDRESS ins...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/18846 Should we also apply this change to `RpcEnv` ? @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12147: [SPARK-14361][SQL]Window function exclude clause
Github user xwu0226 commented on the issue: https://github.com/apache/spark/pull/12147 @HyukjinKwon My rebased branch has broken most of the window exclude test cases. Trying to fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18846: [SPARK-21642][CORE] Use FQDN for DRIVER_HOST_ADDRESS ins...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18846 **[Test build #80323 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80323/testReport)** for PR 18846 at commit [`afc07ee`](https://github.com/apache/spark/commit/afc07ee14974a38c3b6912dfd2943084d25eeccf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18846: [SPARK-21642][CORE] Use FQDN for DRIVER_HOST_ADDRESS ins...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18846 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18865 cc @gatorsmile @cloud-fan Can you help trigger Jenkins for this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18865 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18866 I added the unit test referring (https://github.com/apache/hive/blob/branch-1/ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractBucketJoinProc.java#L393). Hive will sort bucket files by file names when do SMB join. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18866 **[Test build #80322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80322/testReport)** for PR 18866 at commit [`51d2c11`](https://github.com/apache/spark/commit/51d2c110d01b8a4ef1d53d144c443e0e9b43817b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...
GitHub user jmchung reopened a pull request: https://github.com/apache/spark/pull/18865 [SPARK-21610][SQL] Corrupt records are not handled properly when creating a dataframe from a file ## What changes were proposed in this pull request? ``` echo '{"field": 1} {"field": 2} {"field": "3"}' >/tmp/sample.json ``` ```scala import org.apache.spark.sql.types._ val schema = new StructType() .add("field", ByteType) .add("_corrupt_record", StringType) val file = "/tmp/sample.json" val dfFromFile = spark.read.schema(schema).json(file) scala> dfFromFile.show(false) +-+---+ |field|_corrupt_record| +-+---+ |1|null | |2|null | |null |{"field": "3"} | +-+---+ scala> dfFromFile.filter($"_corrupt_record".isNotNull).count() res1: Long = 0 scala> dfFromFile.filter($"_corrupt_record".isNull).count() res2: Long = 3 ``` When the `requiredSchema` only contains `_corrupt_record`, the derived `actualSchema` is empty and the `_corrupt_record` are all null for all rows. When users requires only `_corrupt_record`, we assume that the corrupt records are required for all json fields. ## How was this patch tested? Added test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jmchung/spark SPARK-21610 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18865.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18865 commit 09aa76cc228162edba7ece45063592cd17ae4a27 Author: Jen-Ming Chung Date: 2017-08-07T03:52:45Z [SPARK-21610][SQL] Corrupt records are not handled properly when creating a dataframe from a file commit f73c3874a9e6a35344a3dc8f6ec8cfb17a1be2f8 Author: Jen-Ming Chung Date: 2017-08-07T04:39:36Z add explanation to schema change and minor refactor in test case commit 7a595984f16f6c998883f271bf63e2e84af5f046 Author: Jen-Ming Chung Date: 2017-08-07T04:59:07Z move test case from DataFrameReaderWriterSuite to JsonSuite commit 97290f0f891f4261bf173c5ff596d0bb33168d57 Author: Jen-Ming Chung Date: 2017-08-07T05:41:15Z filter not _corrupt_record in dataSchema commit f5eec40d51bec8ed0f79f52c5a408ba98f26ca1a Author: Jen-Ming Chung Date: 2017-08-07T06:17:48Z code refactor --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18866: [SPARK-21649][SQL] Support writing data into hive...
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18866 [SPARK-21649][SQL] Support writing data into hive bucket table. ## What changes were proposed in this pull request? Support writing hive bucket table. Spark internally uses Murmur3Hash for partitioning. We can use hive hash for compatibility when write to bucket table. ## How was this patch tested? Unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinxing64/spark SPARK-21649 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18866.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18866 commit 51d2c110d01b8a4ef1d53d144c443e0e9b43817b Author: jinxing Date: 2017-08-07T04:12:56Z Support writing data into hive bucket table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...
Github user jmchung closed the pull request at: https://github.com/apache/spark/pull/18865 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...
GitHub user jmchung opened a pull request: https://github.com/apache/spark/pull/18865 [SPARK-21610][SQL] Corrupt records are not handled properly when creating a dataframe from a file ## What changes were proposed in this pull request? ``` echo '{"field": 1} {"field": 2} {"field": "3"}' >/tmp/sample.json ``` ```scala import org.apache.spark.sql.types._ val schema = new StructType() .add("field", ByteType) .add("_corrupt_record", StringType) val file = "/tmp/sample.json" val dfFromFile = spark.read.schema(schema).json(file) scala> dfFromFile.show(false) +-+---+ |field|_corrupt_record| +-+---+ |1|null | |2|null | |null |{"field": "3"} | +-+---+ scala> dfFromFile.filter($"_corrupt_record".isNotNull).count() res1: Long = 0 scala> dfFromFile.filter($"_corrupt_record".isNull).count() res2: Long = 3 ``` When the `requiredSchema` only contains `_corrupt_record`, the derived `actualSchema` is empty and the `_corrupt_record` are all null for all rows. When users requires only `_corrupt_record`, we assume that the corrupt records are required for all json fields. ## How was this patch tested? Added test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jmchung/spark SPARK-21610 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18865.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18865 commit 09aa76cc228162edba7ece45063592cd17ae4a27 Author: Jen-Ming Chung Date: 2017-08-07T03:52:45Z [SPARK-21610][SQL] Corrupt records are not handled properly when creating a dataframe from a file commit f73c3874a9e6a35344a3dc8f6ec8cfb17a1be2f8 Author: Jen-Ming Chung Date: 2017-08-07T04:39:36Z add explanation to schema change and minor refactor in test case commit 7a595984f16f6c998883f271bf63e2e84af5f046 Author: Jen-Ming Chung Date: 2017-08-07T04:59:07Z move test case from DataFrameReaderWriterSuite to JsonSuite commit 97290f0f891f4261bf173c5ff596d0bb33168d57 Author: Jen-Ming Chung Date: 2017-08-07T05:41:15Z filter not _corrupt_record in dataSchema commit f5eec40d51bec8ed0f79f52c5a408ba98f26ca1a Author: Jen-Ming Chung Date: 2017-08-07T06:17:48Z code refactor --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r131578995 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1121,6 +1125,30 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** + * Create a function name LTRIM for TRIM(Leading), RTRIM for TRIM(Trailing), TRIM for TRIM(BOTH), + * otherwise, return the original function identifier. + */ + private def replaceTrimFunction(funcID: FunctionIdentifier, ctx: FunctionCallContext) +: FunctionIdentifier = { --- End diff -- ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r131579031 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1121,6 +1125,30 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** + * Create a function name LTRIM for TRIM(Leading), RTRIM for TRIM(Trailing), TRIM for TRIM(BOTH), + * otherwise, return the original function identifier. + */ + private def replaceTrimFunction(funcID: FunctionIdentifier, ctx: FunctionCallContext) +: FunctionIdentifier = { +val opt = ctx.trimOption +if (opt != null) { + if (ctx.qualifiedName.getText.toLowerCase != "trim") { +throw new ParseException(s"The specified function ${ctx.qualifiedName.getText} " + + s"doesn't support with option ${opt.getText}.", ctx) + } + opt.getType match { +case SqlBaseParser.BOTH => funcID +case SqlBaseParser.LEADING => funcID.copy(funcName = "ltrim") +case SqlBaseParser.TRAILING => funcID.copy(funcName = "rtrim") +case _ => throw new ParseException(s"Function trim doesn't support with" + --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r131578980 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2304,7 +2304,15 @@ object functions { * @group string_funcs * @since 1.5.0 */ - def ltrim(e: Column): Column = withExpr {StringTrimLeft(e.expr) } + def ltrim(e: Column): Column = withExpr {StringTrimLeft(e.expr)} --- End diff -- sure, I will change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131578382 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1393,6 +1393,16 @@ def replace(self, to_replace, value=None, subset=None): |null| null| null| ++--+-+ +>>> df4.na.replace('Alice', None).show() --- End diff -- OK. I'm fine with this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18861: [SPARK-19426][SQL] Custom coalescer for Dataset
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18861 **[Test build #80321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80321/testReport)** for PR 18861 at commit [`c0306d3`](https://github.com/apache/spark/commit/c0306d346e336a3bae6335e27f676c3254d915cb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r131576044 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -356,6 +356,16 @@ class CodegenContext { private val placeHolderToComments = new mutable.HashMap[String, String] /** + * Returns the length of codegen function is too long or not + */ + def existTooLongFunction(): Boolean = { --- End diff -- Adding a checking logics here, instead of returning `Boolean`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r131575786 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -370,6 +370,12 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co override def doExecute(): RDD[InternalRow] = { val (ctx, cleanedSource) = doCodeGen() +val existLongFunction = ctx.existTooLongFunction +if (existLongFunction) { + logWarning(s"Function is too long, Whole-stage codegen disabled for this plan:\n " ++ s"$treeString") --- End diff -- This could be very big. Please follow what did in https://github.com/apache/spark/pull/18658 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18764 @SparkQA Take a test, please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18861: [SPARK-19426][SQL] Custom coalescer for Dataset
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18861 **[Test build #80320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80320/testReport)** for PR 18861 at commit [`413b0eb`](https://github.com/apache/spark/commit/413b0eb55659d31cd21fbc1c858d3da1603d2248). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18864: [SPARK-21648] [SQL] Fix confusing assert failure in JDBC...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18864 **[Test build #80319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80319/testReport)** for PR 18864 at commit [`e4aac50`](https://github.com/apache/spark/commit/e4aac502d58972063a1ab25f17a1c217abe97b97). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18864: [SPARK-21648] [SQL] Fix confusing assert failure in JDBC...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18864 cc @zsxwing @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18864: [SPARK-21648] [SQL] Fix confusing assert failure ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18864#discussion_r131574704 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -29,17 +29,22 @@ class JdbcRelationProvider extends CreatableRelationProvider override def createRelation( sqlContext: SQLContext, parameters: Map[String, String]): BaseRelation = { +import JDBCOptions._ + val jdbcOptions = new JDBCOptions(parameters) val partitionColumn = jdbcOptions.partitionColumn val lowerBound = jdbcOptions.lowerBound val upperBound = jdbcOptions.upperBound val numPartitions = jdbcOptions.numPartitions val partitionInfo = if (partitionColumn.isEmpty) { - assert(lowerBound.isEmpty && upperBound.isEmpty) --- End diff -- cc @dongjoon-hyun --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18864: [SPARK-21648] [SQL] Fix confusing assert failure ...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/18864 [SPARK-21648] [SQL] Fix confusing assert failure in JDBC source when parallel fetching parameters are not properly provided. ### What changes were proposed in this pull request? ```SQL CREATE TABLE mytesttable1 USING org.apache.spark.sql.jdbc OPTIONS ( url 'jdbc:mysql://${jdbcHostname}:${jdbcPort}/${jdbcDatabase}?user=${jdbcUsername}&password=${jdbcPassword}', dbtable 'mytesttable1', paritionColumn 'state_id', lowerBound '0', upperBound '52', numPartitions '53', fetchSize '1' ) ``` The above option name `paritionColumn` is wrong. That mean, users did not provide the value for `partitionColumn`. In such case, users hit a confusing error. ``` AssertionError: assertion failed java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:39) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:312) ``` ### How was this patch tested? Added a test case You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark jdbcPartCol Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18864.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18864 commit e4aac502d58972063a1ab25f17a1c217abe97b97 Author: gatorsmile Date: 2017-08-05T05:38:15Z improve message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18855: [SPARK-3151][Block Manager] DiskStore.getBytes fails for...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18855 Yea, please refer http://apache-spark-developers-list.1001551.n3.nabble.com/Some-PRs-not-automatically-linked-to-JIRAs-td22067.html Looks some problems related with it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18830: [SPARK-21621][Core] Reset numRecordsWritten after DiskBl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18830 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18830: [SPARK-21621][Core] Reset numRecordsWritten after DiskBl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18830 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80316/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18810 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18810 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80318/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18810 **[Test build #80318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80318/testReport)** for PR 18810 at commit [`1b0ac5e`](https://github.com/apache/spark/commit/1b0ac5ed896136df3579a61d7ef93980c0647e97). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18830: [SPARK-21621][Core] Reset numRecordsWritten after DiskBl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18830 **[Test build #80316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80316/testReport)** for PR 18830 at commit [`d82401d`](https://github.com/apache/spark/commit/d82401d1771009e02a81152b70b4fa48ce077593). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18810 **[Test build #80318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80318/testReport)** for PR 18810 at commit [`1b0ac5e`](https://github.com/apache/spark/commit/1b0ac5ed896136df3579a61d7ef93980c0647e97). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18810 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18576: [SPARK-21351][SQL] Update nullability based on ch...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/18576#discussion_r131573104 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -94,27 +94,14 @@ case class FilterExec(condition: Expression, child: SparkPlan) case _ => false } - // If one expression and its children are null intolerant, it is null intolerant. - private def isNullIntolerant(expr: Expression): Boolean = expr match { -case e: NullIntolerant => e.children.forall(isNullIntolerant) -case _ => false - } - - // The columns that will filtered out by `IsNotNull` could be considered as not nullable. - private val notNullAttributes = notNullPreds.flatMap(_.references).distinct.map(_.exprId) - // Mark this as empty. We'll evaluate the input during doConsume(). We don't want to evaluate // all the variables at the beginning to take advantage of short circuiting. override def usedInputs: AttributeSet = AttributeSet.empty + // Since some plan rewrite rules (e.g., python.ExtractPythonUDFs) possibly change child's output + // from optimized logical plans, we need to adjust the filter's output here. override def output: Seq[Attribute] = { -child.output.map { a => - if (a.nullable && notNullAttributes.contains(a.exprId)) { -a.withNullability(false) - } else { -a - } -} +child.output.map { attr => outputAttrs.find(_.exprId == attr.exprId).getOrElse(attr) } } --- End diff -- I simply tried to drop updating nullability and reuse output attributes `outputAttrs` in an optimized logical plan here though, some python tests failed (all the scala tests passed). I checked and I found this; in the planner path of python, we have some cases changing operator's output from the optimized logical plan to a physical plan. For example; ``` sql("""SELECT strlen(a) FROM test WHERE strlen(a) > 1""") // pyspark >>> spark.sql("SELECT strlen(a) FROM test WHERE strlen(a) > 1").explain(True) ... == Optimized Logical Plan == Project [strlen(a#0) AS strlen(a)#30] +- Filter (strlen(a#0) > 1) +- LogicalRDD [a#0] == Physical Plan == *Project [pythonUDF0#34 AS strlen(a)#30] +- BatchEvalPython [strlen(a#0)], [a#0, pythonUDF0#34] +- *Filter (pythonUDF0#33 > 1), [a#0] +- BatchEvalPython [strlen(a#0)], [a#0, pythonUDF0#33] +- Scan ExistingRDD[a#0] ``` So, I added code to check a difference between `outputAttrs` and `child.output`. Could you give me insight on this? @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18862: [SPARK-21640][FOLLOW-UP] added errorifexists on IllegalA...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18862 **[Test build #80317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80317/testReport)** for PR 18862 at commit [`592ab60`](https://github.com/apache/spark/commit/592ab60742497e5c8157b19bb03a0315e90fb039). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18862: [SPARK-21640][FOLLOW-UP] added errorifexists on IllegalA...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18862 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17583 A gentle ping since I think this is quite helpful. @jkbradley @MLnick @yanboliang @srowen @holdenk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131572498 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1393,6 +1393,16 @@ def replace(self, to_replace, value=None, subset=None): |null| null| null| ++--+-+ +>>> df4.na.replace('Alice', None).show() --- End diff -- I assume we removed `dataframe.replace` to promote use `dataframe.na.replace`? The doc says they are aliases anyway. I don't know but I tend to agree with paring doc tests and this looks removed in https://github.com/apache/spark/commit/ff26767c03cc76e7e86b238300367fa0d9b3e6a4. Let's leave this as is for now. I don't want to make this PR complicated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18733: [SPARK-21535][ML]Reduce memory requirement for Cr...
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/18733 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18861: [SPARK-19426][SQL] Custom coalescer for Dataset
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/18861#discussion_r131571620 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -753,6 +753,16 @@ case class Repartition(numPartitions: Int, shuffle: Boolean, child: LogicalPlan) } /** + * Returns a new RDD that has at most `numPartitions` partitions. This behavior can be modified by + * supplying a `PartitionCoalescer` to control the behavior of the partitioning. + */ +case class PartitionCoalesce(numPartitions: Int, coalescer: PartitionCoalescer, child: LogicalPlan) + extends UnaryNode { --- End diff -- yea, I think so. I'll try and plz give me days to do so. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18861: [SPARK-19426][SQL] Custom coalescer for Dataset
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/18861#discussion_r131571547 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -1185,23 +1194,21 @@ class SizeBasedCoalescer(val maxSize: Int) extends PartitionCoalescer with Seria totalSum += splitSize } -while (index < partitions.size) { +while (index < partitions.length) { val partition = partitions(index) - val fileSplit = - partition.asInstanceOf[HadoopPartition].inputSplit.value.asInstanceOf[FileSplit] - val splitSize = fileSplit.getLength + val splitSize = getPartitionSize(partition) if (currentSum + splitSize < maxSize) { addPartition(partition, splitSize) index += 1 -if (index == partitions.size) { - updateGroups +if (index == partitions.length) { + updateGroups() } } else { -if (currentGroup.partitions.size == 0) { +if (currentGroup.partitions.isEmpty) { addPartition(partition, splitSize) index += 1 } else { - updateGroups + updateGroups() --- End diff -- ok, I'll drop these from this pr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18861: [SPARK-19426][SQL] Custom coalescer for Dataset
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18861#discussion_r131571449 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -1185,23 +1194,21 @@ class SizeBasedCoalescer(val maxSize: Int) extends PartitionCoalescer with Seria totalSum += splitSize } -while (index < partitions.size) { +while (index < partitions.length) { val partition = partitions(index) - val fileSplit = - partition.asInstanceOf[HadoopPartition].inputSplit.value.asInstanceOf[FileSplit] - val splitSize = fileSplit.getLength + val splitSize = getPartitionSize(partition) if (currentSum + splitSize < maxSize) { addPartition(partition, splitSize) index += 1 -if (index == partitions.size) { - updateGroups +if (index == partitions.length) { + updateGroups() } } else { -if (currentGroup.partitions.size == 0) { +if (currentGroup.partitions.isEmpty) { addPartition(partition, splitSize) index += 1 } else { - updateGroups + updateGroups() --- End diff -- I am fine about this, but it might confuse the others. Maybe just remove them in this PR? You can submit a separate PR later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18861: [SPARK-19426][SQL] Custom coalescer for Dataset
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/18861#discussion_r131571248 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -1185,23 +1194,21 @@ class SizeBasedCoalescer(val maxSize: Int) extends PartitionCoalescer with Seria totalSum += splitSize } -while (index < partitions.size) { +while (index < partitions.length) { val partition = partitions(index) - val fileSplit = - partition.asInstanceOf[HadoopPartition].inputSplit.value.asInstanceOf[FileSplit] - val splitSize = fileSplit.getLength + val splitSize = getPartitionSize(partition) if (currentSum + splitSize < maxSize) { addPartition(partition, splitSize) index += 1 -if (index == partitions.size) { - updateGroups +if (index == partitions.length) { + updateGroups() } } else { -if (currentGroup.partitions.size == 0) { +if (currentGroup.partitions.isEmpty) { addPartition(partition, splitSize) index += 1 } else { - updateGroups + updateGroups() --- End diff -- Yea, I just left the changes of the original author (probably refactoring stuffs?) ..., better remove this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131571005 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1393,6 +1393,16 @@ def replace(self, to_replace, value=None, subset=None): |null| null| null| ++--+-+ +>>> df4.na.replace('Alice', None).show() --- End diff -- I've not noticed that. Why we test `dataframe.na.replace` at the doc test of `dataframe.replace`? We should test `dataframe.replace` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18861: [SPARK-19426][SQL] Custom coalescer for Dataset
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18861#discussion_r131570851 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -1185,23 +1194,21 @@ class SizeBasedCoalescer(val maxSize: Int) extends PartitionCoalescer with Seria totalSum += splitSize } -while (index < partitions.size) { +while (index < partitions.length) { val partition = partitions(index) - val fileSplit = - partition.asInstanceOf[HadoopPartition].inputSplit.value.asInstanceOf[FileSplit] - val splitSize = fileSplit.getLength + val splitSize = getPartitionSize(partition) if (currentSum + splitSize < maxSize) { addPartition(partition, splitSize) index += 1 -if (index == partitions.size) { - updateGroups +if (index == partitions.length) { + updateGroups() } } else { -if (currentGroup.partitions.size == 0) { +if (currentGroup.partitions.isEmpty) { addPartition(partition, splitSize) index += 1 } else { - updateGroups + updateGroups() --- End diff -- All the above changes are not related to this PR, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18861: [SPARK-19426][SQL] Custom coalescer for Dataset
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18861#discussion_r131570879 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -753,6 +753,16 @@ case class Repartition(numPartitions: Int, shuffle: Boolean, child: LogicalPlan) } /** + * Returns a new RDD that has at most `numPartitions` partitions. This behavior can be modified by + * supplying a `PartitionCoalescer` to control the behavior of the partitioning. + */ +case class PartitionCoalesce(numPartitions: Int, coalescer: PartitionCoalescer, child: LogicalPlan) + extends UnaryNode { --- End diff -- Adding new logical nodes also needs the updates in multiple different components. (e.g., Optimizer). Is that possible to reuse the existing node `Repartition`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18861: [SPARK-19426][SQL] Custom coalescer for Dataset
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/18861#discussion_r131570565 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -571,7 +570,8 @@ case class UnionExec(children: Seq[SparkPlan]) extends SparkPlan { * current upstream partitions will be executed in parallel (per whatever * the current partitioning is). */ -case class CoalesceExec(numPartitions: Int, child: SparkPlan) extends UnaryExecNode { +case class CoalesceExec(numPartitions: Int, child: SparkPlan, coalescer: Option[PartitionCoalescer]) --- End diff -- ok! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18576 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80315/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18576 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18576 **[Test build #80315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80315/testReport)** for PR 18576 at commit [`5d2fd6d`](https://github.com/apache/spark/commit/5d2fd6db8dc4130a948e5bb4d09fe0f776d16145). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class FilterExec(condition: Expression, child: SparkPlan, outputAttrs: Seq[Attribute])` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18861: [SPARK-19426][SQL] Custom coalescer for Dataset
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18861#discussion_r131570472 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -571,7 +570,8 @@ case class UnionExec(children: Seq[SparkPlan]) extends SparkPlan { * current upstream partitions will be executed in parallel (per whatever * the current partitioning is). */ -case class CoalesceExec(numPartitions: Int, child: SparkPlan) extends UnaryExecNode { +case class CoalesceExec(numPartitions: Int, child: SparkPlan, coalescer: Option[PartitionCoalescer]) --- End diff -- Could you add the parm description of `coalescer`? also update function descriptions? Thanks~! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131570273 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionsSuite.scala --- @@ -261,5 +261,18 @@ class DataFrameNaFunctionsSuite extends QueryTest with SharedSQLContext { assert(out1(3).get(2).asInstanceOf[Double].isNaN) assert(out1(4) === Row("Amy", null, null)) assert(out1(5) === Row(null, null, null)) + +// Replace with null +val out2 = input.na.replace("name", Map( + "Bob" -> "Bravo", + "Alice" -> null --- End diff -- Agree. Please try to improve the test case coverage. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18820 cc @ueshin Could you also take a look the code changes in the Python side? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18820 Could you also add a test case to cover the end-to-end use case the JIRA mentioned? Also put it in the PR description, which will be part of the PR commit. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131570031 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -314,6 +316,7 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { * (Scala-specific) Replaces values matching keys in `replacement` map. * Key and value of `replacement` map must have the same type, and * can only be doubles, strings or booleans. + * `replacement` map value can have null. --- End diff -- Do not put it here. It should be put in @parm. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131569954 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -366,11 +370,15 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { return df } -// replacementMap is either Map[String, String] or Map[Double, Double] or Map[Boolean,Boolean] -val replacementMap: Map[_, _] = replacement.head._2 match { - case v: String => replacement - case v: Boolean => replacement - case _ => replacement.map { case (k, v) => (convertToDouble(k), convertToDouble(v)) } +// replacementMap is either Map[String, String], Map[Double, Double], Map[Boolean,Boolean] +// while value can have null --- End diff -- If the types are not these three types, what are the behaviors? Could you explain them here? Also, please add negative examples too. Thanks~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131569819 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala --- @@ -145,8 +145,8 @@ class DataTypeSuite extends SparkFunSuite { val message = intercept[SparkException] { left.merge(right) }.getMessage -assert(message.equals("Failed to merge fields 'b' and 'b'. " + - "Failed to merge incompatible data types FloatType and LongType")) +assert(message === "Failed to merge fields 'b' and 'b'. " + --- End diff -- Nit: not related to this PR. Please revert it back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18820 @bravo-zhang Could you update the PR description to explain what this PR is trying to achieve? So far, it is not clear enough to explain what you did in this PR. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131569456 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1393,6 +1393,16 @@ def replace(self, to_replace, value=None, subset=None): |null| null| null| ++--+-+ +>>> df4.na.replace('Alice', None).show() --- End diff -- I guess it is `.na.replace` vs `.replace`. I think both should be the same though. I just built against this PR and double checked as below: ```python >>> df = spark.createDataFrame([('Alice', 10, 80.0)]).replace("Alice") ``` ```python >>> df.replace("Alice").first() ``` ``` Row(_1=None, _2=10, _3=80.0) ``` ```python >>> df.na.replace("Alice").first() ``` ``` Traceback (most recent call last): File "", line 1, in TypeError: replace() takes at least 3 arguments (2 given) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131569039 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1393,6 +1393,16 @@ def replace(self, to_replace, value=None, subset=None): |null| null| null| ++--+-+ +>>> df4.na.replace('Alice', None).show() --- End diff -- This change allows us to do `df4.na.replace('Alice')`. I think SPARK-19454 doesn't? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131568706 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1393,6 +1393,16 @@ def replace(self, to_replace, value=None, subset=None): |null| null| null| ++--+-+ +>>> df4.na.replace('Alice', None).show() --- End diff -- and was thinking of not doing this here as strictly it should be a followup for SPARK-19454. I am fine with doing this here too while we are here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18468 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80314/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #80314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80314/testReport)** for PR 18468 at commit [`a26dc15`](https://github.com/apache/spark/commit/a26dc150f6b95cc42558561cd2548de04a89f041). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131568383 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1393,6 +1393,16 @@ def replace(self, to_replace, value=None, subset=None): |null| null| null| ++--+-+ +>>> df4.na.replace('Alice', None).show() --- End diff -- Actually, I think this should be something to be fixed in `DataFrameNaFunctions.replace` in this file ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18769: [SPARK-21574][SQL] Point out user to set hive config bef...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/18769 @gatorsmile Docs syntax issues was fixed by https://github.com/apache/spark/pull/18793. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18833: [SPARK-21625][SQL] sqrt(negative number) should b...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18833#discussion_r131568132 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala --- @@ -403,11 +403,13 @@ class MathExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { test("sqrt") { testUnary(Sqrt, math.sqrt, (0 to 20).map(_ * 0.1)) -testUnary(Sqrt, math.sqrt, (-5 to -1).map(_ * 1.0), expectNaN = true) +testUnary(Sqrt, math.sqrt, (-5 to -1).map(_ * 1.0), expectNull = true) --- End diff -- We have `IsNaN`. So users might already use it to check those invalid values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131567901 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1393,6 +1393,16 @@ def replace(self, to_replace, value=None, subset=None): |null| null| null| ++--+-+ +>>> df4.na.replace('Alice', None).show() --- End diff -- Looks like now we allow something like `df4.na.replace('Alice').show()`. We're better add it here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131567720 --- Diff: python/pyspark/sql/tests.py --- @@ -1964,6 +1964,16 @@ def test_replace(self): .replace(False, True).first()) self.assertTupleEqual(row, (True, True)) +# replace with None +row = self.spark.createDataFrame( +[(u'Alice', 10, 80.0)], schema).replace(u'Alice', None).first() +self.assertTupleEqual(row, (None, 10, 80.0)) + +# replace with numerics and None +row = self.spark.createDataFrame( +[(u'Alice', 10, 80.0)], schema).replace([10, 80], [20, None]).first() +self.assertTupleEqual(row, (u'Alice', 20, None)) --- End diff -- Can you add a test that `to_replace` is a list and `value` is not given (so as default value `None`)? Previously this will arise a `ValueError`. But now it is a valid usage. We are better to add a test explicitly for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18863: [SPARK-21647] [SQL] Fix SortMergeJoin when using CROSS
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80313/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18863: [SPARK-21647] [SQL] Fix SortMergeJoin when using CROSS
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18863 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18863: [SPARK-21647] [SQL] Fix SortMergeJoin when using CROSS
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18863 **[Test build #80313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80313/testReport)** for PR 18863 at commit [`f351fb1`](https://github.com/apache/spark/commit/f351fb1cbda8104f4f7e6ffa0be07f26b290683e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18833: [SPARK-21625][SQL] sqrt(negative number) should b...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18833#discussion_r131566961 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala --- @@ -403,11 +403,13 @@ class MathExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { test("sqrt") { testUnary(Sqrt, math.sqrt, (0 to 20).map(_ * 0.1)) -testUnary(Sqrt, math.sqrt, (-5 to -1).map(_ * 1.0), expectNaN = true) +testUnary(Sqrt, math.sqrt, (-5 to -1).map(_ * 1.0), expectNull = true) --- End diff -- Users always use `is not null` filter invalid value, but spark sql breaks: ``` > create table spark_21625 as select 10 as c1, sqrt(-10) as c2; spark-sql> select * from spark_21625; 10 NaN spark-sql> select * from spark_21625 where c2 is not null; 10 NaN ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18833: [SPARK-21625][SQL] sqrt(negative number) should b...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18833#discussion_r131566102 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala --- @@ -403,11 +403,13 @@ class MathExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { test("sqrt") { testUnary(Sqrt, math.sqrt, (0 to 20).map(_ * 0.1)) -testUnary(Sqrt, math.sqrt, (-5 to -1).map(_ * 1.0), expectNaN = true) +testUnary(Sqrt, math.sqrt, (-5 to -1).map(_ * 1.0), expectNull = true) --- End diff -- Yes, we migrated Hive and MySQL SQLs to Spark, and found some inconsistencies. `NaN` is unfamiliar to MySQL and Oracle users --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18833: [SPARK-21625][SQL] sqrt(negative number) should b...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18833#discussion_r131566009 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala --- @@ -403,11 +403,13 @@ class MathExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { test("sqrt") { testUnary(Sqrt, math.sqrt, (0 to 20).map(_ * 0.1)) -testUnary(Sqrt, math.sqrt, (-5 to -1).map(_ * 1.0), expectNaN = true) +testUnary(Sqrt, math.sqrt, (-5 to -1).map(_ * 1.0), expectNull = true) --- End diff -- Yea, I was writing this comment. If `NaN` makes sense in a way, I was thinking we couldn't consider this case as a bug that should be fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131566016 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionsSuite.scala --- @@ -261,5 +261,18 @@ class DataFrameNaFunctionsSuite extends QueryTest with SharedSQLContext { assert(out1(3).get(2).asInstanceOf[Double].isNaN) assert(out1(4) === Row("Amy", null, null)) assert(out1(5) === Row(null, null, null)) + +// Replace with null +val out2 = input.na.replace("name", Map( + "Bob" -> "Bravo", + "Alice" -> null --- End diff -- I saw you allow a replacement like (k: Double, null). Can you also add a test for such replacement? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18833: [SPARK-21625][SQL] sqrt(negative number) should b...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18833#discussion_r131565248 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala --- @@ -403,11 +403,13 @@ class MathExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { test("sqrt") { testUnary(Sqrt, math.sqrt, (0 to 20).map(_ * 0.1)) -testUnary(Sqrt, math.sqrt, (-5 to -1).map(_ * 1.0), expectNaN = true) +testUnary(Sqrt, math.sqrt, (-5 to -1).map(_ * 1.0), expectNull = true) --- End diff -- Looks like you're changing the NaN cases for many math expressions to Null. I'm not sure if we can do thing like this to break compatibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18813: [SPARK-21567][SQL] Dataset should work with type alias
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18813 ping @cloud-fan @hvanhovell Can you help to review this change? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18641: [SPARK-21413][SQL] Fix 64KB JVM bytecode limit problem i...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18641 ping @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] BinaryComparison shouldn't auto cast ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18853 How about casting the `int` values into `string` ones in that case you described in the description, and then comparing them by a lexicographical order? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18474: [SPARK-21235][TESTS] UTest should clear temp results whe...
Github user wangjiaochun commented on the issue: https://github.com/apache/spark/pull/18474 Yes, Running this on Windows7. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18710: [SPARK][Docs] Added note on meaning of position to subst...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18710 gentle ping @maclockard. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18111: [SPARK-20886][CORE] HadoopMapReduceCommitProtocol...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18111#discussion_r131562253 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -73,7 +73,10 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: String) val stagingDir: String = committer match { // For FileOutputCommitter it has its own staging path called "work path". - case f: FileOutputCommitter => Option(f.getWorkPath.toString).getOrElse(path) + case f: FileOutputCommitter => +val workPath = f.getWorkPath +require(workPath != null, s"Committer has no workpath $f") +Option(workPath.toString).getOrElse(path) --- End diff -- I wonder the answer to this question ^ actually .. Wouldn't `Option(...).getOrElse(path)` be unnecessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18830: [SPARK-21621][Core] Reset numRecordsWritten after DiskBl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18830 **[Test build #80316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80316/testReport)** for PR 18830 at commit [`d82401d`](https://github.com/apache/spark/commit/d82401d1771009e02a81152b70b4fa48ce077593). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18474: [SPARK-21235][TESTS] UTest should clear temp results whe...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18474 @wangjiaochun Are you running this on Windows? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18830: [SPARK-21621][Core] Reset numRecordsWritten after DiskBl...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18830 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18791: [SPARK-21571][Scheduler] Spark history server leaves inc...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/18791 Yea, I'm just thinking whether it is possible we can have a perfect approach that we can be confident to turn it on by default. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18576 **[Test build #80315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80315/testReport)** for PR 18576 at commit [`5d2fd6d`](https://github.com/apache/spark/commit/5d2fd6db8dc4130a948e5bb4d09fe0f776d16145). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18820 Other than few comments above, LGTM. Any other comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131559076 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1446,7 +1457,7 @@ def all_of_(xs): if isinstance(to_replace, (float, int, long, basestring)): to_replace = [to_replace] -if isinstance(value, (float, int, long, basestring)): +if isinstance(value, (float, int, long, basestring)) or value is None: --- End diff -- This looks causing the warning always: ```python >>> df = sc.parallelize([("Alice", 1, 3.0)]).toDF() >>> df.replace({"Alice": "Bob"}).show() ``` ``` .../spark/python/pyspark/sql/dataframe.py:1466: UserWarning: to_replace is a dict and value is not None. value will be ignored. warnings.warn("to_replace is a dict and value is not None. value will be ignored.") ... ``` Could we put this line into 1468L (under `else`)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131559178 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1460,7 +1471,8 @@ def all_of_(xs): subset = [subset] # Verify we were not passed in mixed type generics." --- End diff -- Where we are here, let's remove this `"` at the end, which looks a typo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #80314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80314/testReport)** for PR 18468 at commit [`a26dc15`](https://github.com/apache/spark/commit/a26dc150f6b95cc42558561cd2548de04a89f041). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18863: [SPARK-21647] [SQL] Fix SortMergeJoin when using CROSS
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18863 **[Test build #80313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80313/testReport)** for PR 18863 at commit [`f351fb1`](https://github.com/apache/spark/commit/f351fb1cbda8104f4f7e6ffa0be07f26b290683e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18863: [SPARK-21647] [SQL] Fix SortMergeJoin when using CROSS
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18863 cc @cloud-fan @BoleynSu @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org