[GitHub] spark pull request #18756: [SPARK-21548][SQL] "Support insert into serial co...
GitHub user lvdongr opened a pull request: https://github.com/apache/spark/pull/18756 [SPARK-21548][SQL] "Support insert into serial columns of table" ## What changes were proposed in this pull request? When we use the 'insert into ...' statement we can only insert all the columns into table.But int some cases,our table has many columns and we are only interest in some of them.So we want to support the statement "insert into table tbl (column1, column2,...) values (value1, value2, value3,...)". https://issues.apache.org/jira/browse/SPARK-21548 ## How was this patch tested? unit tests, integration tests, manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvdongr/spark SPARK-21548 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18756.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18756 commit 01af8ce69afeade8bb034c6965de0f3738f12fd5 Author: lvdongr Date: 2017-03-08T04:09:40Z [SPARK-19863][DStream] Whether or not use CachedKafkaConsumer need to be configured, when you use DirectKafkaInputDStream to connect the kafka in a Spark Streaming application has been successfully created. commit b6daeec664d757999e257e56fed3844db51515e2 Author: lvdongr Date: 2017-03-11T06:35:57Z Merge remote-tracking branch 'apache/master' commit e0e47b1da93b90210e44abc6e90655d3028555ec Author: lvdongr Date: 2017-04-12T07:20:01Z Merge remote-tracking branch 'apache/master' commit f4ab88111c5b8e9700eacc1acfa3858aed45124e Author: lvdongr Date: 2017-07-27T01:54:56Z isklakldsng branch 'apache/master' commit 463e570f9e05f785834e27bd535cfbb3b7cb7dfb Author: lvdongr Date: 2017-07-27T12:09:47Z Merge remote-tracking branch 'apache/master' commit 2a40d64bcad6613892a54bc3052a634f59c14c65 Author: lvdongr Date: 2017-07-28T06:56:15Z [SPARK-21548][SQL]Support insert into serial columns of table --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18525: [SPARK-21297] [WEB-UI]Add count in 'JDBC/ODBC Server' pa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18525 **[Test build #3855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3855/testReport)** for PR 18525 at commit [`4001028`](https://github.com/apache/spark/commit/4001028926f08dd8e2286e6c8cb2cd81315b6a93). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18738: Typo in comment
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18738 **[Test build #3856 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3856/testReport)** for PR 18738 at commit [`dd2eb6b`](https://github.com/apache/spark/commit/dd2eb6bec99b80b085b11f4ee12c4d3feb66461e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r130021930 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1219,44 +1219,91 @@ case class WidthBucket( override def dataType: DataType = LongType override def nullable: Boolean = true + private val isFoldable = minValue.foldable && maxValue.foldable && numBucket.foldable + + private lazy val _minValue: Any = minValue.eval(EmptyRow) + private lazy val minValueV = _minValue.asInstanceOf[Double] --- End diff -- if `minValue.eval(EmptyRow) == null`, `minValue.eval(EmptyRow).asInstanceOf[Double]` will be `0.0`, So keep both of them here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18525: [SPARK-21297] [WEB-UI]Add count in 'JDBC/ODBC Server' pa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18525 **[Test build #3855 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3855/testReport)** for PR 18525 at commit [`4001028`](https://github.com/apache/spark/commit/4001028926f08dd8e2286e6c8cb2cd81315b6a93). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18750: Skip maven-compiler-plugin main and test compilations in...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18750 Likewise @vanzin any thoughts on this one? because it touches the compilation and build --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18745: [SPARK-21544][DEPLOY] Tests jar of some module should no...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18745 @vanzin do you have any thoughts on this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18725: [SPARK-21520][SQL]Hivetable scan for all the colu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18725#discussion_r130021324 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -495,6 +495,23 @@ object ColumnPruning extends Rule[LogicalPlan] { // Eliminate no-op Projects case p @ Project(_, child) if sameOutput(child.output, p.output) => child +// The column of father project contains not deterministic function +// e.g Rand function. father project will be split to two project. +case h @ Project(fields, _: LeafNode) if !fields.forall(_.deterministic) => --- End diff -- I meant, in all the cases projectList contains non-deterministic expr, do we always need to split the project on all kinds of LeafNode? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18323: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18323 **[Test build #80015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80015/testReport)** for PR 18323 at commit [`0940a49`](https://github.com/apache/spark/commit/0940a49ebf731221a5bcebcd0021e3ed35d9a6ad). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18725: [SPARK-21520][SQL]Hivetable scan for all the colu...
Github user heary-cao commented on a diff in the pull request: https://github.com/apache/spark/pull/18725#discussion_r130020290 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -495,6 +495,23 @@ object ColumnPruning extends Rule[LogicalPlan] { // Eliminate no-op Projects case p @ Project(_, child) if sameOutput(child.output, p.output) => child +// The column of father project contains not deterministic function +// e.g Rand function. father project will be split to two project. +case h @ Project(fields, _: LeafNode) if !fields.forall(_.deterministic) => --- End diff -- There is no need to split for all LeafNode. if and only if the projectList is non-deterministic for LeafNode's father. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18755: [SPARK-21553][Spark Shell] Added the description of the ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18755 **[Test build #3854 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3854/testReport)** for PR 18755 at commit [`d764f5e`](https://github.com/apache/spark/commit/d764f5e8c589cff87668bb95bf3e6e046668fa54). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r130018225 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala --- @@ -792,6 +793,104 @@ class ArrowConvertersSuite extends SharedSQLContext with BeforeAndAfterAll { collectAndValidate(df, json, "binaryData.json") } + test("date type conversion") { +val json = + s""" + |{ + | "schema" : { + |"fields" : [ { + | "name" : "date", + | "type" : { + |"name" : "date", + |"unit" : "DAY" + | }, + | "nullable" : true, + | "children" : [ ], + | "typeLayout" : { + |"vectors" : [ { + | "type" : "VALIDITY", + | "typeBitWidth" : 1 + |}, { + | "type" : "DATA", + | "typeBitWidth" : 32 + |} ] + | } + |} ] + | }, + | "batches" : [ { + |"count" : 4, + |"columns" : [ { + | "name" : "date", + | "count" : 4, + | "VALIDITY" : [ 1, 1, 1, 1 ], + | "DATA" : [ -1, 0, 16533, 382607 ] + |} ] + | } ] + |} + """.stripMargin + +val sdf = new SimpleDateFormat("-MM-dd HH:mm:ss.SSS z", Locale.US) +val d1 = DateTimeUtils.toJavaDate(-1) // "1969-12-31" +val d2 = DateTimeUtils.toJavaDate(0) // "1970-01-01" +val d3 = new Date(sdf.parse("2015-04-08 13:10:15.000 UTC").getTime) +val d4 = new Date(sdf.parse("3017-07-18 14:55:00.000 UTC").getTime) --- End diff -- `d3` and `d4` might be flaky in some timezone. Should we use `Date.valueOf()`?: ```scala val d3 = Date.valueOf("2015-04-08") val d4 = Date.valueOf("3017-07-18") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18751: [SPARK-21548][SQL]Support insert into serial colu...
Github user lvdongr closed the pull request at: https://github.com/apache/spark/pull/18751 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r130015435 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -309,6 +313,23 @@ private[ml] object DefaultParamsWriter { val metadataJson: String = compact(render(metadata)) metadataJson } + + /** + * Save estimator's `initialModel` to corresponding path. + */ + def saveInitialModel[T <: HasInitialModel[_ <: MLWritable with Params]]( + instance: T, path: String): Unit = { +if (instance.isDefined(instance.initialModel)) { + val initialModelPath = new Path(path, "initialModel").toString + val initialModel = instance.getOrDefault(instance.initialModel) + // When saving, only keep the direct initialModel by eliminating possible initialModels of the + // direct initialModel, to avoid unnecessary deep recursion of initialModel. + if (initialModel.hasParam("initialModel")) { +initialModel.clear(initialModel.getParam("initialModel")) + } + initialModel.save(initialModelPath) --- End diff -- Fair enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18754: [WIP][SPARK-21552][SQL] Add DecimalType support to Arrow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18754 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18754: [WIP][SPARK-21552][SQL] Add DecimalType support to Arrow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18754 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80014/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18754: [WIP][SPARK-21552][SQL] Add DecimalType support to Arrow...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18754 **[Test build #80014 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80014/testReport)** for PR 18754 at commit [`9e60762`](https://github.com/apache/spark/commit/9e60762d830c320967742d80cb17c55631f6b11a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18540 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80012/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18540 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18540 **[Test build #80012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80012/testReport)** for PR 18540 at commit [`9abdb5e`](https://github.com/apache/spark/commit/9abdb5eee7aab766fe73ca00749efa2a16328882). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18749 Will address comments within few days. (I am reading docs just to get used to things around my updated status) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18695: [SPARK-12717][PYTHON] Adding thread-safe broadcast pickl...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18695 The change LGTM. Will it be hard to add a reliable test for this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18655 @BryanCutler @wesm @cpcloud I filed a JIRA issue for decimal type support [SPARK-21552](https://issues.apache.org/jira/browse/SPARK-21552) and sent a pr for it as WIP #18754. Let's move on there for discussing decimal type support. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18755: [SPARK-21553][Spark Shell] Added the description of the ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18755 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18754: [WIP][SPARK-21552][SQL] Add DecimalType support to Arrow...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18754 **[Test build #80014 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80014/testReport)** for PR 18754 at commit [`9e60762`](https://github.com/apache/spark/commit/9e60762d830c320967742d80cb17c55631f6b11a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18755: [SPARK-21553][Spark Shell] Added the description ...
GitHub user davidxdh opened a pull request: https://github.com/apache/spark/pull/18755 [SPARK-21553][Spark Shell] Added the description of the default value of master parameter in the spark-shell When I type spark-shell --help, I find that the default value description for the master parameter is missing. The user does not know what the default value is when the master parameter is not included, so we need to add the master parameter default description to the help information. [https://issues.apache.org/jira/browse/SPARK-21553](https://issues.apache.org/jira/browse/SPARK-21553) You can merge this pull request into a Git repository by running: $ git pull https://github.com/davidxdh/spark dev_0728 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18755.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18755 commit 95602dc2e0ccde3f2d94789307048474f2d0ae7f Author: Donghui Xu Date: 2017-07-28T01:31:48Z Merge pull request #1 from apache/master Merge from apache/spark commit d764f5e8c589cff87668bb95bf3e6e046668fa54 Author: davidxdh Date: 2017-07-28T03:46:00Z [SPARK-21553][Spark Shell] Added the description of the default value of master parameter in the spark-shell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18754: [WIP][SPARK-21552][SQL] Add DecimalType support t...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/18754 [WIP][SPARK-21552][SQL] Add DecimalType support to ArrowWriter. ## What changes were proposed in this pull request? Decimal type is not yet supported in `ArrowWriter`. This is adding the decimal type support. ## How was this patch tested? Added a test to `ArrowConvertersSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-21552 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18754.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18754 commit 9e60762d830c320967742d80cb17c55631f6b11a Author: Takuya UESHIN Date: 2017-07-26T04:34:31Z Add DecimalType support to ArrowWriter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/17435 @szalai1 Could you fix tests if you're still working on this please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17435 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17435 **[Test build #80013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80013/testReport)** for PR 17435 at commit [`8872e19`](https://github.com/apache/spark/commit/8872e190b16b328205e0df569d5f5bc3af6c5610). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17435 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80013/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18185 @gatorsmile ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18185 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80011/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18185 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18185 **[Test build #80011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80011/testReport)** for PR 18185 at commit [`23ca897`](https://github.com/apache/spark/commit/23ca897825a51baa1b879c3b7968749199e8724f). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UnresolvedSubqueryColumnAliases(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17435 **[Test build #80013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80013/testReport)** for PR 17435 at commit [`8872e19`](https://github.com/apache/spark/commit/8872e190b16b328205e0df569d5f5bc3af6c5610). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/17435 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18753: [SPARK-21548] [SQL] Support insert into serial columns o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18753 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18753: [SPARK-21548] [SQL] Support insert into serial co...
GitHub user lvdongr opened a pull request: https://github.com/apache/spark/pull/18753 [SPARK-21548] [SQL] Support insert into serial columns of table ## What changes were proposed in this pull request? When we use the 'insert into ...' statement we can only insert all the columns into table.But int some cases,our table has many columns and we are only interest in some of them.So we want to support the statement "insert into table tbl (column1, column2,...) values (value1, value2, value3,...)". ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvdongr/spark SPARK--21548 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18753.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18753 commit 01af8ce69afeade8bb034c6965de0f3738f12fd5 Author: lvdongr Date: 2017-03-08T04:09:40Z [SPARK-19863][DStream] Whether or not use CachedKafkaConsumer need to be configured, when you use DirectKafkaInputDStream to connect the kafka in a Spark Streaming application has been successfully created. commit b6daeec664d757999e257e56fed3844db51515e2 Author: lvdongr Date: 2017-03-11T06:35:57Z Merge remote-tracking branch 'apache/master' commit e0e47b1da93b90210e44abc6e90655d3028555ec Author: lvdongr Date: 2017-04-12T07:20:01Z Merge remote-tracking branch 'apache/master' commit f4ab88111c5b8e9700eacc1acfa3858aed45124e Author: lvdongr Date: 2017-07-27T01:54:56Z isklakldsng branch 'apache/master' commit 463e570f9e05f785834e27bd535cfbb3b7cb7dfb Author: lvdongr Date: 2017-07-27T12:09:47Z Merge remote-tracking branch 'apache/master' commit da882ea569d451b3f2af550b0976a6a059900f6a Author: lvdongr Date: 2017-07-28T02:56:23Z [SPARK-21548][SQL]Support insert into serial columns of table commit a65be1605865a1159532ba148434d3bb207da64c Author: lvdongr Date: 2017-07-28T03:03:23Z refresh last commit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18468 @cloud-fan how will we go forward? @rxin seems to have no comment for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18725 Can the current fix work for the case like the following? Project [a] Filter [rand() > 1] TableScan [a, b, c] `PhysicalOperation` still fails for non-deterministic `Filter`. So you still read all columns from the table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18525: [SPARK-21297] [WEB-UI]Add count in 'JDBC/ODBC Server' pa...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/18525 @ajbozarth Okay.Thanks. @srowen Help review the code.Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18725: [SPARK-21520][SQL]Hivetable scan for all the colu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18725#discussion_r13082 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -495,6 +495,23 @@ object ColumnPruning extends Rule[LogicalPlan] { // Eliminate no-op Projects case p @ Project(_, child) if sameOutput(child.output, p.output) => child +// The column of father project contains not deterministic function +// e.g Rand function. father project will be split to two project. +case h @ Project(fields, _: LeafNode) if !fields.forall(_.deterministic) => --- End diff -- The question is, do we always need to split the project for all LeafNode? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18554: [SPARK-21306][ML] OneVsRest should support setWei...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18554 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18554: [SPARK-21306][ML] OneVsRest should support setWeightCol
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18554 Merged into master, thanks for all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18540 **[Test build #80012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80012/testReport)** for PR 18540 at commit [`9abdb5e`](https://github.com/apache/spark/commit/9abdb5eee7aab766fe73ca00749efa2a16328882). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17203: [SPARK-19863][DStream] Whether or not use CachedK...
Github user lvdongr closed the pull request at: https://github.com/apache/spark/pull/17203 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r129998159 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala --- @@ -33,4 +33,13 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils with TestHiveSingleton assert(conf(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname) === "") } } + + test("newTemporaryConfiguration respect spark.hadoop.foo=bar in SparkConf") { +sys.props.put("spark.hadoop.foo", "bar") --- End diff -- @cloud-fan at the very beginning, the spark-sumit do the same thing that add properties from --conf and spark-default.conf to sys.props. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r129997548 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -309,6 +313,23 @@ private[ml] object DefaultParamsWriter { val metadataJson: String = compact(render(metadata)) metadataJson } + + /** + * Save estimator's `initialModel` to corresponding path. + */ + def saveInitialModel[T <: HasInitialModel[_ <: MLWritable with Params]]( + instance: T, path: String): Unit = { +if (instance.isDefined(instance.initialModel)) { + val initialModelPath = new Path(path, "initialModel").toString + val initialModel = instance.getOrDefault(instance.initialModel) + // When saving, only keep the direct initialModel by eliminating possible initialModels of the + // direct initialModel, to avoid unnecessary deep recursion of initialModel. + if (initialModel.hasParam("initialModel")) { +initialModel.clear(initialModel.getParam("initialModel")) + } + initialModel.save(initialModelPath) --- End diff -- Actually we did in the later way, ```initialModel``` is only param for Estimator, not for Model. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18725: [SPARK-21520][SQL]Hivetable scan for all the colu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18725#discussion_r129997298 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -495,6 +495,23 @@ object ColumnPruning extends Rule[LogicalPlan] { // Eliminate no-op Projects case p @ Project(_, child) if sameOutput(child.output, p.output) => child +// The column of father project contains not deterministic function +// e.g Rand function. father project will be split to two project. +case h @ Project(fields, _: LeafNode) if !fields.forall(_.deterministic) => --- End diff -- It is a bit difficult to infer why this rule exists without the context of this pr. Please add a comment for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18725: [SPARK-21520][SQL]Hivetable scan for all the colu...
Github user heary-cao commented on a diff in the pull request: https://github.com/apache/spark/pull/18725#discussion_r129997034 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -495,6 +495,23 @@ object ColumnPruning extends Rule[LogicalPlan] { // Eliminate no-op Projects case p @ Project(_, child) if sameOutput(child.output, p.output) => child +// The column of father project contains not deterministic function +// e.g Rand function. father project will be split to two project. +case h @ Project(fields, _: LeafNode) if !fields.forall(_.deterministic) => --- End diff -- Other non LeafNode case will be handled by ColumnPruning and CollapseProject. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16648: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/16648 ping @bdrillard for the 2nd part of this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18185 **[Test build #80011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80011/testReport)** for PR 18185 at commit [`23ca897`](https://github.com/apache/spark/commit/23ca897825a51baa1b879c3b7968749199e8724f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r129989469 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/HasParallelism.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.param.shared + +import scala.concurrent.ExecutionContext + +import org.apache.spark.ml.param.{IntParam, Params, ParamValidators} +import org.apache.spark.util.ThreadUtils + +/** + * Common parameter for estimators trained in a multithreaded environment. + */ +private[ml] trait HasParallelism extends Params { + + /** + * param for the number of threads to use when running parallel meta-algorithms + * @group expertParam + */ + val parallelism = new IntParam(this, "parallelism", +"the number of threads to use when running parallel algorithms", ParamValidators.gtEq(1)) + + setDefault(parallelism -> 1) + + /** @group expertGetParam */ + def getParallelism: Int = $(parallelism) + + /** @group expertSetParam */ + def setParallelism(value: Int): this.type = { --- End diff -- You can remove this now that it is in OneVsRest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r129989379 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -294,6 +296,18 @@ final class OneVsRest @Since("1.4.0") ( @Since("1.5.0") def setPredictionCol(value: String): this.type = set(predictionCol, value) + /** @group expertGetParam */ + override def getParallelism: Int = $(parallelism) + + /** + * @group expertSetParam + * The implementation of parallel one vs. rest runs the classification for --- End diff -- Also, please put the group annotation at the bottom to match existing code style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r129989332 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -294,6 +296,18 @@ final class OneVsRest @Since("1.4.0") ( @Since("1.5.0") def setPredictionCol(value: String): this.type = set(predictionCol, value) + /** @group expertGetParam */ + override def getParallelism: Int = $(parallelism) + + /** + * @group expertSetParam + * The implementation of parallel one vs. rest runs the classification for --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18745: [SPARK-21544][DEPLOY] Tests jar of some module should no...
Github user caneGuy commented on the issue: https://github.com/apache/spark/pull/18745 cc @srowen Test done,any more problem?Thanks too much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18281 @holdenk Some of those improvements on handling parallelism sounds useful, but I'd prefer we merge this and then add more improvements. This PR should be a strict improvement there (moving from no parallelism to some potential for parallelism). Do people have more comments before this is merged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18185 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18185 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80010/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18185 **[Test build #80010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80010/testReport)** for PR 18185 at commit [`2b50e50`](https://github.com/apache/spark/commit/2b50e5088d4eca5d38837e421f7e9960a2e2128d). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UnresolvedSubqueryColumnAliases(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18185 **[Test build #80010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80010/testReport)** for PR 18185 at commit [`2b50e50`](https://github.com/apache/spark/commit/2b50e5088d4eca5d38837e421f7e9960a2e2128d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18185 Thanks! Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18740: [SPARK-21538][SQL] Attribute resolution inconsist...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18740 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18740 Thanks! Merging to master/2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18725: [SPARK-21520][SQL]Hivetable scan for all the colu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18725#discussion_r129984515 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -495,6 +495,23 @@ object ColumnPruning extends Rule[LogicalPlan] { // Eliminate no-op Projects case p @ Project(_, child) if sameOutput(child.output, p.output) => child +// The column of father project contains not deterministic function +// e.g Rand function. father project will be split to two project. +case h @ Project(fields, _: LeafNode) if !fields.forall(_.deterministic) => --- End diff -- Then once your project is not on top of a LeafNode, this rule doesn't work? Your fix is just for the specified case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r129982672 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -101,6 +101,45 @@ class OneVsRestSuite extends SparkFunSuite with MLlibTestSparkContext with Defau assert(expectedMetrics.confusionMatrix ~== ovaMetrics.confusionMatrix absTol 400) } + test("one-vs-rest: tuning parallelism does not change output") { --- End diff -- Is there a good way to do that? I'm having trouble thinking of ways to do it which would not produce flaky tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 @gatorsmile Actually it is not rare we add a feature step by step in SparkSQL. This is not a reason preventing us from adding this support. I think this change already help much this kind of workload. As said in previous discussion, we can't avoid few issues regarding the non-deterministic non equi join condition. We can simply allow it, but it faces inconsistency due to different join implementations. We can pull out it to downstream project, but it possibly changes the number of calls. `EnsureRequirements` can change the call order. Notice that those issues are for non equi join condition, equi join condition is free from the issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17180: [SPARK-19839][Core]release longArray in BytesToBytesMap
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/17180 retest it please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18752: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18752 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18752: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18752 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18752: [SPARK-21551][Python] Increase timeout for Python...
GitHub user peay opened a pull request: https://github.com/apache/spark/pull/18752 [SPARK-21551][Python] Increase timeout for PythonRDD.serveIterator ## What changes were proposed in this pull request? This modification increases the timeout for `serveIterator` (which is not dynamically configurable). This fixes timeout issues in pyspark when using `collect` and similar functions, in cases where Python may take more than a couple seconds to connect. ## How was this patch tested? Ran the tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/peay/spark spark-21551 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18752.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18752 commit 9d3c6640f56e3e4fd195d3ad8cead09df67a72c7 Author: peay Date: 2017-07-27T20:49:28Z [SPARK-21551][Python] Increase timeout for PythonRDD.serveIterator --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18740 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18740 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80009/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18740 **[Test build #80009 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80009/testReport)** for PR 18740 at commit [`0b0eea9`](https://github.com/apache/spark/commit/0b0eea941cb850967e943719822cfab89479a025). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18664 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80008/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18664 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18664 **[Test build #80008 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80008/testReport)** for PR 18664 at commit [`3b83d7a`](https://github.com/apache/spark/commit/3b83d7acf17433b1f5581f0b8b87c54a91309839). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ArrowWriter(val root: VectorSchemaRoot, fields: Array[ArrowFieldWriter]) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18555: [SPARK-21353][CORE]add checkValue in spark.intern...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18555#discussion_r129946652 --- Diff: core/src/test/scala/org/apache/spark/SparkConfSuite.scala --- @@ -322,6 +324,291 @@ class SparkConfSuite extends SparkFunSuite with LocalSparkContext with ResetSyst conf.validateSettings() } + test("verify spark.blockManager.port configuration") { +val conf = new SparkConf(false) + .setMaster("local").setAppName("My app") + +conf.validateSettings() +assert(!conf.contains(BLOCK_MANAGER_PORT.key)) + +Seq( + "0", // normal values + "1024", // min values + "65535" // max values +).foreach { value => + conf.set(BLOCK_MANAGER_PORT.key, value) + var sc0 = new SparkContext(conf) + assert(sc0.isStopped === false) + assert(sc0.conf.get(BLOCK_MANAGER_PORT) === value.toInt) + sc0.stop() + conf.remove(BLOCK_MANAGER_PORT) +} + +// Verify abnormal values +Seq( + "-1", + "1000", + "65536" +).foreach { value => + conf.set(BLOCK_MANAGER_PORT.key, value) + val excMsg = intercept[IllegalArgumentException] { +new SparkContext(conf) + }.getMessage + // Caused by: java.lang.IllegalArgumentException: + // blockManager port should be between 1024 and 65535 (inclusive), + // or 0 for a random free port. + assert(excMsg.contains("blockManager port should be between 1024 " + +"and 65535 (inclusive), or 0 for a random free port.")) + + conf.remove(BLOCK_MANAGER_PORT) +} + } + + test("verify spark.executor.memory configuration exception") { +val conf = new SparkConf(false) + .setMaster("local").setAppName("executor memory") + .set(EXECUTOR_MEMORY.key, "-1") +val excMsg = intercept[NumberFormatException] { + sc = new SparkContext(conf) +}.getMessage +// Caused by: java.lang.NumberFormatException: +// Size must be specified as bytes (b), kibibytes (k), +// mebibytes (m), gibibytes (g), tebibytes (t), +// or pebibytes(p). E.g. 50b, 100k, or 250m. +assert(excMsg.contains("Size must be specified as bytes (b), kibibytes (k), " + + "mebibytes (m), gibibytes (g), tebibytes (t), or pebibytes(p). E.g. 50b, 100k, or 250m.")) + } + + test("verify spark.task.cpus configuration exception") { +val conf = new SparkConf(false) + .setMaster("local").setAppName("cpus") + .set(CPUS_PER_TASK.key, "-1") +val excMsg = intercept[IllegalArgumentException] { + sc = new SparkContext(conf) +}.getMessage +// Caused by: java.lang.IllegalArgumentException: +// Number of cores to allocate for task event queue must be positive. +assert(excMsg.contains("Number of cores to allocate for task event queue must be positive.")) + } + + test("verify spark.task.maxFailures configuration exception") { +val conf = new SparkConf(false) + .setMaster("local").setAppName("task maxFailures") + .set(MAX_TASK_FAILURES.key, "-1") +val sc0 = new SparkContext(conf) +val excMsg = intercept[IllegalArgumentException] { + new TaskSchedulerImpl(sc0) +}.getMessage +// Caused by: java.lang.IllegalArgumentException: +// The retry times of task should be greater than or equal to 1. +assert(excMsg.contains("The retry times of task should be greater than or equal to 1.")) +sc0.stop() + } + + test("verify listenerbus.eventqueue.capacity configuration exception") { +val conf = new SparkConf(false) + .setMaster("local").setAppName("capacity") + .set(LISTENER_BUS_EVENT_QUEUE_CAPACITY.key, "-1") +val excMsg = intercept[IllegalArgumentException] { + sc = new SparkContext(conf) +}.getMessage +// Caused by: java.lang.IllegalArgumentException: +// The capacity of listener bus event queue must be positive. +assert(excMsg.contains("The capacity of listener bus event queue must be positive.")) + } + + test("verify metrics.maxListenerClassesTimed configuration exception") { +val conf = new SparkConf(false) + .setMaster("local").setAppName("listenerbus") + .set(LISTENER_BUS_METRICS_MAX_LISTENER_CLASSES_TIMED.key, "-1") +val excMsg = intercept[IllegalArgumentException] { + sc = new SparkContext(conf) +}.getMessage +// Caused by: java.lang.IllegalArgumentException: +// The maxListenerClassesTimed of listener bus event queue must be positive. +assert(excMsg.contains("The maxListenerClassesTimed of listener bus " + + "event queue must be posit
[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18185 LGTM except a few minor comments. Thanks for working on it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18750: Skip maven-compiler-plugin main and test compilations in...
Github user gslowikowski commented on the issue: https://github.com/apache/spark/pull/18750 Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18185: [SPARK-20962][SQL] Support subquery column aliase...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18185#discussion_r129944781 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -859,6 +859,22 @@ class Analyzer( // rule: ResolveDeserializer. case plan if containsDeserializer(plan.expressions) => plan + case u @ UnresolvedSubqueryColumnAlias(columnNames, child) if child.resolved => +// Resolves output attributes if a query has alias names in its subquery: +// e.g., SELECT * FROM (SELECT 1 AS a, 1 AS b) t(col1, col2) +val outputAttrs = child.output +// Checks if the number of the aliases equals to the number of output columns +// in the subquery. +if (columnNames.size != outputAttrs.size) { + u.failAnalysis(s"Number of column aliases does not match number of columns. " + --- End diff -- Nit: remove the string Interpolator `s`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18185: [SPARK-20962][SQL] Support subquery column aliase...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18185#discussion_r129944055 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -423,6 +423,26 @@ case class UnresolvedAlias( } /** + * Aliased column names for subquery. We could add alias names for output columns in the subquery: --- End diff -- resolved by positions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18185: [SPARK-20962][SQL] Support subquery column aliase...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18185#discussion_r129943797 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -423,6 +423,26 @@ case class UnresolvedAlias( } /** + * Aliased column names for subquery. We could add alias names for output columns in the subquery: + * {{{ + * // Assign alias names for output columns + * SELECT col1, col2 FROM testData AS t(col1, col2); + * }}} + * + * @param outputColumnNames the column names for this subquery. + * @param child the logical plan of this subquery. --- End diff -- Nit: `the logical plan of this subquery` -> `the [[LogicalPlan]] on which this subquery column aliases apply` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18185: [SPARK-20962][SQL] Support subquery column aliase...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18185#discussion_r129943216 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -423,6 +423,26 @@ case class UnresolvedAlias( } /** + * Aliased column names for subquery. We could add alias names for output columns in the subquery: + * {{{ + * // Assign alias names for output columns + * SELECT col1, col2 FROM testData AS t(col1, col2); + * }}} + * + * @param outputColumnNames the column names for this subquery. + * @param child the logical plan of this subquery. + */ +case class UnresolvedSubqueryColumnAlias( --- End diff -- Nit: `UnresolvedSubqueryColumnAlias ` -> `UnresolvedSubqueryColumnAliases` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18185: [SPARK-20962][SQL] Support subquery column aliase...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18185#discussion_r129943038 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -750,20 +750,28 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging /** * Create an alias (SubqueryAlias) for a sub-query. This is practically the same as * visitAliasedRelation and visitNamedExpression, ANTLR4 however requires us to use 3 different - * hooks. + * hooks. We could add alias names for output columns, for example: + * {{{ + * SELECT col1, col2 FROM testData AS t(col1, col2) + * }}} */ override def visitAliasedQuery(ctx: AliasedQueryContext): LogicalPlan = withOrigin(ctx) { -val alias = if (ctx.strictIdentifier == null) { +val alias = if (ctx.tableAlias.strictIdentifier == null) { // For un-aliased subqueries, use a default alias name that is not likely to conflict with // normal subquery names, so that parent operators can only access the columns in subquery by // unqualified names. Users can still use this special qualifier to access columns if they // know it, but that's not recommended. "__auto_generated_subquery_name" } else { - ctx.strictIdentifier.getText + ctx.tableAlias.strictIdentifier.getText +} +val subquery = SubqueryAlias(alias, plan(ctx.queryNoWith).optionalMap(ctx.sample)(withSample)) +if (ctx.tableAlias.identifierList != null) { + val columnNames = visitIdentifierList(ctx.tableAlias.identifierList) --- End diff -- Nit: `columnNames ` -> `columnAliases` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18740 LGTM pending Jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18740 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18740 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80007/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18740 **[Test build #80007 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80007/testReport)** for PR 18740 at commit [`309cb8f`](https://github.com/apache/spark/commit/309cb8f09f5af81f230911574274c0ca9eb65f34). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes exam...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18749#discussion_r129927520 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java --- @@ -26,6 +26,10 @@ private String name; private String extended; private String db; +private String arguments; --- End diff -- There aren't many `ExpressionInfo` objects in memory right? adding more info to this bean doesn't have any meaningful performance implications, I presume. I suppose it's just breaking down the existing info further. I also presume this is considered an internal API so it's OK to change the constructor. You could even retain the one constructor that is removed, just in case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes exam...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18749#discussion_r129929153 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java --- @@ -29,15 +29,40 @@ * show the usage of the function in human language. * * `usage()` will be used for the function usage in brief way. - * `extended()` will be used for the function usage in verbose way, suppose - * an example will be provided. * - * And we can refer the function name by `_FUNC_`, in `usage` and `extended`, as it's + * These below are concatenated and used for the function usage in verbose way, suppose arguments, + * examples, note and since will be provided. + * + * `arguments()` describes arguments for the expression. This should follow the format as below: + * + * Arguments: + * * arg0 - ... + * + * * arg1 - ... + * + * + * `examples()` describes examples for the expression. This should follow the format as below: + * + * Examples: + * > SELECT ...; + * ... + * > SELECT ...; + * ... + * + * `note()` contains some notes for the expression optionally. + * + * `since()` contains version information for the expression. Version is specified by, + * for example, "2.2.0". + * + * We can refer the function name by `_FUNC_`, in `usage`, `arguments` and `examples`, as it's * registered in `FunctionRegistry`. */ @DeveloperApi --- End diff -- Agree, that's my only question, whether this change matters, because it's a developer API. You provide default implementations, though `extended()` gets removed. Hm. I am wondering if it's possible to keep `extended()` but, well, ignore it? it would at least be compatible even if it meant someone's implementation out there would have to update to provide information to `ExpressionInfo` correctly. That's not really a functional problem though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18659: [SPARK-21404][PYSPARK][WIP] Simple Python Vectori...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18659#discussion_r129928163 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala --- @@ -132,6 +135,61 @@ private[sql] object ArrowConverters { } } + private[sql] def fromPayloadIterator(iter: Iterator[ArrowPayload]): Iterator[InternalRow] = { +new Iterator[InternalRow] { + private val _allocator = new RootAllocator(Long.MaxValue) + private var _reader: ArrowFileReader = _ + private var _root: VectorSchemaRoot = _ + private var _index = 0 + + loadNextBatch() + + override def hasNext: Boolean = _root != null && _index < _root.getRowCount + + override def next(): InternalRow = { +val fields = _root.getFieldVectors.asScala + +val genericRowData = fields.map { field => + field.getAccessor.getObject(_index) +}.toArray[Any] --- End diff -- Thanks @kiszk , I'm giving it a try! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18740 **[Test build #80009 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80009/testReport)** for PR 18740 at commit [`0b0eea9`](https://github.com/apache/spark/commit/0b0eea941cb850967e943719822cfab89479a025). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18664 **[Test build #80008 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80008/testReport)** for PR 18664 at commit [`3b83d7a`](https://github.com/apache/spark/commit/3b83d7acf17433b1f5581f0b8b87c54a91309839). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18749 cc @rxin, @srowen and @cloud-fan, I believe this one is ready for a review. Could you take a look when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18540 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18540 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80006/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18540 **[Test build #80006 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80006/testReport)** for PR 18540 at commit [`a1f91cd`](https://github.com/apache/spark/commit/a1f91cd7b0f10176b551cb168bf23d2eef68c15c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18740: [SPARK-21538][SQL] Attribute resolution inconsist...
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/18740#discussion_r129911780 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -1304,6 +1304,15 @@ class DatasetSuite extends QueryTest with SharedSQLContext { assert(rlike3.count() == 0) } } + + test("SPARK-21538: Attribute resolution inconsistency in Dataset API") { +val df = spark.range(1).withColumnRenamed("id", "x") +checkAnswer(df.sort(col("id")), df.sort("id")) +checkAnswer(df.sort($"id"), df.sort("id")) +checkAnswer(df.sort('id), df.sort("id")) +checkAnswer(df.orderBy('id), df.sort("id")) +checkAnswer(df.orderBy("id"), df.sort("id")) --- End diff -- Indeed, looks much better. I appreciate the explanation and will take this into account in the future. I will update the test in a minute, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org