[GitHub] [spark] XuQianJin-Stars edited a comment on issue #26727: [SPARK-30087][CORE] Enhanced implementation of JmxSink on RMI remote calls
XuQianJin-Stars edited a comment on issue #26727: [SPARK-30087][CORE] Enhanced implementation of JmxSink on RMI remote calls URL: https://github.com/apache/spark/pull/26727#issuecomment-565938938 > Can you please investigate how other systems like Kafka, Hadoop handle this problem? Adding parameter could be a way, but my thinking is that if we want to enable RMI, it would be better to provide a security way also. Well, okay, let me look at the implementation of hadoop first. Flink is similarly implemented using open rmi ports. hadoop is implemented through `JMXProxyServlet`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #26831: [SPARK-30201][SQL] HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT
ulysses-you commented on a change in pull request #26831: [SPARK-30201][SQL] HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT URL: https://github.com/apache/spark/pull/26831#discussion_r358089369 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ## @@ -305,12 +305,17 @@ private[hive] trait HiveInspectors { withNullSafe(o => getByteWritable(o)) case _: ByteObjectInspector => withNullSafe(o => o.asInstanceOf[java.lang.Byte]) - case _: JavaHiveVarcharObjectInspector => Review comment: Yes. Just like the `StringObjectInspector` check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #26831: [SPARK-30201][SQL] HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT
ulysses-you commented on a change in pull request #26831: [SPARK-30201][SQL] HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT URL: https://github.com/apache/spark/pull/26831#discussion_r358089369 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ## @@ -305,12 +305,17 @@ private[hive] trait HiveInspectors { withNullSafe(o => getByteWritable(o)) case _: ByteObjectInspector => withNullSafe(o => o.asInstanceOf[java.lang.Byte]) - case _: JavaHiveVarcharObjectInspector => Review comment: Yes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-565938993 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115375/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-565938983 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-565938983 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26896: [MINOR][DOCS] Fix documentation for slide function
AmplabJenkins removed a comment on issue #26896: [MINOR][DOCS] Fix documentation for slide function URL: https://github.com/apache/spark/pull/26896#issuecomment-565938572 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115372/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] XuQianJin-Stars commented on issue #26727: [SPARK-30087][CORE] Enhanced implementation of JmxSink on RMI remote calls
XuQianJin-Stars commented on issue #26727: [SPARK-30087][CORE] Enhanced implementation of JmxSink on RMI remote calls URL: https://github.com/apache/spark/pull/26727#issuecomment-565938938 > Can you please investigate how other systems like Kafka, Hadoop handle this problem? Adding parameter could be a way, but my thinking is that if we want to enable RMI, it would be better to provide a security way also. Well, okay, let me look at the implementation of hadoop first. Flink is similarly implemented using open rmi ports. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-565938993 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115375/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26896: [MINOR][DOCS] Fix documentation for slide function
AmplabJenkins removed a comment on issue #26896: [MINOR][DOCS] Fix documentation for slide function URL: https://github.com/apache/spark/pull/26896#issuecomment-565938566 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #26896: [MINOR][DOCS] Fix documentation for slide function
HyukjinKwon commented on issue #26896: [MINOR][DOCS] Fix documentation for slide function URL: https://github.com/apache/spark/pull/26896#issuecomment-565938890 Merged to master and branch-2.4. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #26896: [MINOR][DOCS] Fix documentation for slide function
HyukjinKwon closed pull request #26896: [MINOR][DOCS] Fix documentation for slide function URL: https://github.com/apache/spark/pull/26896 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
SparkQA removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-565904733 **[Test build #115375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115375/testReport)** for PR 26434 at commit [`98a064b`](https://github.com/apache/spark/commit/98a064b2f0893c4f147ba43c57260900e8d7663b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26896: [MINOR][DOCS] Fix documentation for slide function
AmplabJenkins commented on issue #26896: [MINOR][DOCS] Fix documentation for slide function URL: https://github.com/apache/spark/pull/26896#issuecomment-565938566 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26896: [MINOR][DOCS] Fix documentation for slide function
AmplabJenkins commented on issue #26896: [MINOR][DOCS] Fix documentation for slide function URL: https://github.com/apache/spark/pull/26896#issuecomment-565938572 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115372/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-565938379 **[Test build #115375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115375/testReport)** for PR 26434 at commit [`98a064b`](https://github.com/apache/spark/commit/98a064b2f0893c4f147ba43c57260900e8d7663b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26896: [MINOR][DOCS] Fix documentation for slide function
SparkQA removed a comment on issue #26896: [MINOR][DOCS] Fix documentation for slide function URL: https://github.com/apache/spark/pull/26896#issuecomment-565889436 **[Test build #115372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115372/testReport)** for PR 26896 at commit [`0063278`](https://github.com/apache/spark/commit/006327875576edbbf9b6f8e37cb8e5a5a424a1d7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26896: [MINOR][DOCS] Fix documentation for slide function
SparkQA commented on issue #26896: [MINOR][DOCS] Fix documentation for slide function URL: https://github.com/apache/spark/pull/26896#issuecomment-565938021 **[Test build #115372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115372/testReport)** for PR 26896 at commit [`0063278`](https://github.com/apache/spark/commit/006327875576edbbf9b6f8e37cb8e5a5a424a1d7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on issue #26897: [SPARK-30104][SQL][FOLLOWUP] Remove LookupCatalog.AsTemporaryViewIdentifier
cloud-fan edited a comment on issue #26897: [SPARK-30104][SQL][FOLLOWUP] Remove LookupCatalog.AsTemporaryViewIdentifier URL: https://github.com/apache/spark/pull/26897#issuecomment-565926586 If something is only being tested but not nothing else, it's dead code. I checked the removed tests and they solely test `AsTemporaryViewIdentifier`. LGTM, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on issue #26897: [SPARK-30104][SQL][FOLLOWUP] Remove LookupCatalog.AsTemporaryViewIdentifier
cloud-fan edited a comment on issue #26897: [SPARK-30104][SQL][FOLLOWUP] Remove LookupCatalog.AsTemporaryViewIdentifier URL: https://github.com/apache/spark/pull/26897#issuecomment-565926586 If something is only being tested not nothing else, it's dead code. I checked the removed tests and they solely test `AsTemporaryViewIdentifier`. LGTM, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
HyukjinKwon edited a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-565935837 Sorry, @cloud-fan, I just checked the cc. > The result is unexpected. In ResolveAlias, we only generate the alias if expression is resolved. How does pyspark generate alias for its Row object? cc @HyukjinKwon I don't think there's any differences for the column names being generated in PySpark specifically. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
HyukjinKwon commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-565935837 Sorry, @cloud-fan, I just checked the cc. > The result is unexpected. In ResolveAlias, we only generate the alias if expression is resolved. How does pyspark generate alias for its Row object? cc @HyukjinKwon I don't think there's any differences for the column names being generated PySpark specifically. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26894: [SPARK-30094][SQL] Apply current namespace for the single-part table name
cloud-fan commented on a change in pull request #26894: [SPARK-30094][SQL] Apply current namespace for the single-part table name URL: https://github.com/apache/spark/pull/26894#discussion_r358079795 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala ## @@ -1914,6 +1914,21 @@ class DataSourceV2SQLSuite } } + test("SPARK-30094: current namespace is used during table resolution") { +// unset this config to use the default v2 session catalog. +spark.conf.unset(V2_SESSION_CATALOG_IMPLEMENTATION.key) + +withTable("spark_catalog.t", "testcat.ns.t") { + sql("CREATE TABLE t USING parquet AS SELECT data FROM source") + sql("CREATE TABLE testcat.ns.t USING parquet AS SELECT id FROM source") Review comment: can we use different data for these 2 tables? e.g. `CREATE TABLE t ... AS SELECT 1` and `CREATE TABLE testcat.ns.t ... AS SELECT 2` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
viirya commented on a change in pull request #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#discussion_r358077265 ## File path: mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala ## @@ -247,7 +247,8 @@ class MultilayerPerceptronClassifier @Since("1.5.0") ( } trainer.setStackSize($(blockSize)) val mlpModel = trainer.train(data) -new MultilayerPerceptronClassificationModel(uid, myLayers, mlpModel.weights) +val model = new MultilayerPerceptronClassificationModel(uid, mlpModel.weights) +model.setLayers(myLayers) Review comment: hmm, do we really have exposed parameters into the model? The purpose of this PR is to have training parameters in the model, but except for layers, seems other parameters are not set? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs
AmplabJenkins removed a comment on issue #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs URL: https://github.com/apache/spark/pull/26703#issuecomment-565931713 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20186/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs
AmplabJenkins removed a comment on issue #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs URL: https://github.com/apache/spark/pull/26703#issuecomment-565931705 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on issue #26808: [SPARK-30184][SQL] Implement a helper method for aliasing functions
amanomer commented on issue #26808: [SPARK-30184][SQL] Implement a helper method for aliasing functions URL: https://github.com/apache/spark/pull/26808#issuecomment-565931668 Tests have passed. Kindly review. @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs
AmplabJenkins commented on issue #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs URL: https://github.com/apache/spark/pull/26703#issuecomment-565931713 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20186/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs
AmplabJenkins commented on issue #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs URL: https://github.com/apache/spark/pull/26703#issuecomment-565931705 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType
amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType URL: https://github.com/apache/spark/pull/26811#discussion_r358076319 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ## @@ -850,7 +850,7 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession { val errorMsg1 = s""" |Input to function array_contains should have been array followed by a - |value with same element type, but it's [array, decimal(29,29)]. + |value with same element type, but it's [array, decimal(38,29)]. Review comment: cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer removed a comment on issue #26808: [SPARK-30184][SQL] Implement a helper method for aliasing functions
amanomer removed a comment on issue #26808: [SPARK-30184][SQL] Implement a helper method for aliasing functions URL: https://github.com/apache/spark/pull/26808#issuecomment-565785795 Tests have passed. Kindly review cc @maropu @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType
amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType URL: https://github.com/apache/spark/pull/26811#discussion_r358076174 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ## @@ -850,7 +850,7 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession { val errorMsg1 = s""" |Input to function array_contains should have been array followed by a - |value with same element type, but it's [array, decimal(29,29)]. + |value with same element type, but it's [array, decimal(38,29)]. Review comment: https://github.com/apache/spark/blob/1fc353d51a62cb554e6af23dbc9a613e214e3af1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L864-L869 For query `array_contains(array(1), .01234567890123456790123456780)`\ `e.inputTypes` will return `Seq(Array(Decimal(38,29)), Decimal(38,29))` and above code will cast `.01234567890123456790123456780` as `Decimal(38,29)`. Previously, when we were using `findWiderTypeForTwo`, decimal types were not getting upcasted but `findWiderTypeWithoutStringPromotionForTwo` will successfully upcast DecimalType This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs
SparkQA commented on issue #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs URL: https://github.com/apache/spark/pull/26703#issuecomment-565931326 **[Test build #115380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115380/testReport)** for PR 26703 at commit [`423a47c`](https://github.com/apache/spark/commit/423a47ceec47300ede93802c0be7c51786e1c5b8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs
sarutak commented on a change in pull request #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs URL: https://github.com/apache/spark/pull/26703#discussion_r358075772 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -695,21 +695,20 @@ private[spark] class DAGScheduler( val jobId = nextJobId.getAndIncrement() if (partitions.isEmpty) { + val currentDescription = sc.getLocalProperty(SparkContext.SPARK_JOB_DESCRIPTION) + if (currentDescription == null) { +sc.setJobDescription(callSite.shortForm) + } + val time = clock.getTimeMillis() - val dummyStageInfo = -new StageInfo( - StageInfo.INVALID_STAGE_ID, - StageInfo.INVALID_ATTEMPT_ID, - callSite.shortForm, - 0, - Seq.empty[RDDInfo], - Seq.empty[Int], - "") listenerBus.post( -SparkListenerJobStart( - jobId, time, Seq[StageInfo](dummyStageInfo), Utils.cloneProperties(properties))) +SparkListenerJobStart(jobId, time, Seq.empty, Utils.cloneProperties(properties))) listenerBus.post( SparkListenerJobEnd(jobId, time, JobSucceeded)) + + if (currentDescription == null) { +sc.setJobDescription(null) + } Review comment: Thanks! I've merged it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
viirya commented on a change in pull request #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#discussion_r358075604 ## File path: python/pyspark/ml/classification.py ## @@ -2274,21 +2276,21 @@ def setSolver(self, value): return self._set(solver=value) -class MultilayerPerceptronClassificationModel(JavaProbabilisticClassificationModel, JavaMLWritable, +class MultilayerPerceptronClassificationModel(JavaProbabilisticClassificationModel, + _MultilayerPerceptronParams, JavaMLWritable, JavaMLReadable): """ Model fitted by MultilayerPerceptronClassifier. .. versionadded:: 1.6.0 """ -@property -@since("1.6.0") -def layers(self): +@since("3.0.0") +def setLayers(self, value): Review comment: Do we need to add this setter? Do we allow to change layers of a model? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs
sarutak commented on a change in pull request #26703: [SPARK-29997][WEBUI][FOLLOWUP] Refactor code for job description of empty jobs URL: https://github.com/apache/spark/pull/26703#discussion_r358075572 ## File path: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ## @@ -827,7 +827,7 @@ private[spark] object ApiHelper { def lastStageNameAndDescription(store: AppStatusStore, job: JobData): (String, String) = { // Some jobs have only 0 partitions. if (job.stageIds.isEmpty) { - ("", job.name) + ("", "") Review comment: Thanks. I've reverted this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jerryshao commented on issue #26727: [SPARK-30087][CORE] Enhanced implementation of JmxSink on RMI remote calls
jerryshao commented on issue #26727: [SPARK-30087][CORE] Enhanced implementation of JmxSink on RMI remote calls URL: https://github.com/apache/spark/pull/26727#issuecomment-565930482 Can you please investigate how other systems like Kafka, Hadoop handle this problem? Adding parameter could be a way, but my thinking is that if we want to enable RMI, it would be better to provide a security way also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default
AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default URL: https://github.com/apache/spark/pull/26813#issuecomment-565929821 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20185/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default
AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default URL: https://github.com/apache/spark/pull/26813#issuecomment-565929813 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default
AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default URL: https://github.com/apache/spark/pull/26813#issuecomment-565929813 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default
AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default URL: https://github.com/apache/spark/pull/26813#issuecomment-565929821 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20185/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default
SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default URL: https://github.com/apache/spark/pull/26813#issuecomment-565929533 **[Test build #115379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115379/testReport)** for PR 26813 at commit [`9a819b7`](https://github.com/apache/spark/commit/9a819b74eed9bf4f037cb184b453b088fb12daed). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default
JkSelf commented on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default URL: https://github.com/apache/spark/pull/26813#issuecomment-565928572 @cloud-fan Ok, I will fix the failed tests firstly. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf opened a new pull request #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default
JkSelf opened a new pull request #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default URL: https://github.com/apache/spark/pull/26813 ### What changes were proposed in this pull request? Enable adaptive query execution default ### Why are the changes needed? To expand the usage of AQE. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing unit tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default
cloud-fan edited a comment on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default URL: https://github.com/apache/spark/pull/26813#issuecomment-565927319 It's arguable that if we should turn on AQE by default in 3.0 or not. But I think it's worthwhile to try turn it on and fix tests. We can merge this PR with tests fixed and still keep AQE off. If we end up with deciding turn on AQE by default, then it will be a one-line change if we fix all the failed tests here. @JkSelf can you reopen and fix tests first? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #26756: [SPARK-30119][WebUI]Support Pagination for Batch Tables in Streaming Tab
shahidki31 commented on a change in pull request #26756: [SPARK-30119][WebUI]Support Pagination for Batch Tables in Streaming Tab URL: https://github.com/apache/spark/pull/26756#discussion_r358072287 ## File path: streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala ## @@ -482,11 +484,61 @@ private[ui] class StreamingPage(parent: StreamingTab) } - private def generateBatchListTables(): Seq[Node] = { + private def streamingBatchTable( + request: HttpServletRequest, + batchData: Seq[BatchUIData], + streamingBatchTag: String, + batchInterval: Long): Seq[Node] = { +val parameterOtherTable = request.getParameterMap.asScala + .filterNot(_._1.startsWith(streamingBatchTag)) + .map { case (name, vals) => +name + "=" + vals(0) + } + +val parameterStreamingBatchPage = request.getParameter(streamingBatchTag + ".page") +val parameterStreamingBatchSortColumn = request.getParameter(streamingBatchTag + ".sort") +val parameterStreamingBatchSortDesc = request.getParameter(streamingBatchTag + ".desc") +val parameterStreamingBatchPageSize = request.getParameter(streamingBatchTag + ".pageSize") +val streamingBatchPage = Option(parameterStreamingBatchPage).map(_.toInt).getOrElse(1) +val streamingBatchSortColumn = Option(parameterStreamingBatchSortColumn).map { sortColumn => + SparkUIUtils.decodeURLParameter(sortColumn) +}.getOrElse("Batch Time") +val streamingBatchSortDesc = Option(parameterStreamingBatchSortDesc).map(_.toBoolean).getOrElse( + streamingBatchSortColumn == "Batch Time" +) +val streamingBatchPageSize = Option(parameterStreamingBatchPageSize).map(_.toInt).getOrElse(100) + +try { + new StreamingBatchPagedTable( +request, +parent, +batchInterval, +batchData, +streamingBatchTag, +SparkUIUtils.prependBaseUri(request, parent.basePath), +"streaming", // subPath +parameterOtherTable, +pageSize = streamingBatchPageSize, +sortColumn = streamingBatchSortColumn, +desc = streamingBatchSortDesc + ).table(streamingBatchPage) +} catch { + case e @ (_ : IllegalArgumentException | _ : IndexOutOfBoundsException) => + + Error while rendering job table: + +{Utils.exceptionString(e)} + + +} + } + + private def generateBatchListTables(request: HttpServletRequest): Seq[Node] = { val runningBatches = listener.runningBatches.sortBy(_.batchTime.milliseconds).reverse val waitingBatches = listener.waitingBatches.sortBy(_.batchTime.milliseconds).reverse val completedBatches = listener.retainedCompletedBatches. sortBy(_.batchTime.milliseconds).reverse +val activeBatchData = waitingBatches ++ runningBatches Review comment: I am not sure we can simply append the table. Please refer earlier code. To check, if there is no change, could you please attach screenshot for that (before and after PR) https://github.com/apache/spark/blob/1fc353d51a62cb554e6af23dbc9a613e214e3af1/streaming/src/main/scala/org/apache/spark/streaming/ui/AllBatchesTable.scala#L130-L135 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default
cloud-fan commented on issue #26813: [SPARK-30188][SQL][WIP] Enable adaptive query execution by default URL: https://github.com/apache/spark/pull/26813#issuecomment-565927319 It's arguable that if we should turn on AQE by default in 3.0 or not. But I think it's worthwhile to try turn it on and fix tests. We can merge this PR with tests fixed and still keep AQE off. f we end up with deciding turn on AQE by default, then it will be a one-line change if we fix all the failed tests here. @JkSelf can you reopen and fix tests first? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26897: [SPARK-30104][SQL][FOLLOWUP] Remove LookupCatalog.AsTemporaryViewIdentifier
cloud-fan commented on issue #26897: [SPARK-30104][SQL][FOLLOWUP] Remove LookupCatalog.AsTemporaryViewIdentifier URL: https://github.com/apache/spark/pull/26897#issuecomment-565926586 If something is only used in tests, it's dead code. I checked the removed tests and they solely test `AsTemporaryViewIdentifier`. LGTM, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes
viirya commented on issue #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes URL: https://github.com/apache/spark/pull/26898#issuecomment-565926605 Looks good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String)
AmplabJenkins removed a comment on issue #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String) URL: https://github.com/apache/spark/pull/26903#issuecomment-565926110 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String)
AmplabJenkins removed a comment on issue #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String) URL: https://github.com/apache/spark/pull/26903#issuecomment-565926119 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20184/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String)
AmplabJenkins commented on issue #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String) URL: https://github.com/apache/spark/pull/26903#issuecomment-565926110 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String)
AmplabJenkins commented on issue #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String) URL: https://github.com/apache/spark/pull/26903#issuecomment-565926119 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20184/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Support columnar execution on interval types
yaooqinn commented on a change in pull request #26699: [SPARK-30066][SQL] Support columnar execution on interval types URL: https://github.com/apache/spark/pull/26699#discussion_r358071270 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnType.scala ## @@ -705,6 +705,37 @@ private[columnar] case class MAP(dataType: MapType) override def clone(v: UnsafeMapData): UnsafeMapData = v.copy() } +private[columnar] object CALENDAR_INTERVAL extends ColumnType[CalendarInterval] + with DirectCopyColumnType[CalendarInterval] { + + override def dataType: DataType = CalendarIntervalType + + override def defaultSize: Int = 16 + + override def actualSize(row: InternalRow, ordinal: Int): Int = 20 + + override def getField(row: InternalRow, ordinal: Int): CalendarInterval = row.getInterval(ordinal) + + override def setField(row: InternalRow, ordinal: Int, value: CalendarInterval): Unit = { +row.setInterval(ordinal, value) + } + + override def extract(buffer: ByteBuffer): CalendarInterval = { +ByteBufferHelper.getInt(buffer) Review comment: yes, we can skip it, the current implementation is just to reuse copy logic of `MutableUnsafeRow` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String)
SparkQA commented on issue #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String) URL: https://github.com/apache/spark/pull/26903#issuecomment-565925834 **[Test build #115378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115378/testReport)** for PR 26903 at commit [`f47a255`](https://github.com/apache/spark/commit/f47a2559424c8e6c724f2036af7e0c4615a7ec0d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26699: [SPARK-30066][SQL] Support columnar execution on interval types
cloud-fan commented on a change in pull request #26699: [SPARK-30066][SQL] Support columnar execution on interval types URL: https://github.com/apache/spark/pull/26699#discussion_r358070749 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnType.scala ## @@ -705,6 +705,37 @@ private[columnar] case class MAP(dataType: MapType) override def clone(v: UnsafeMapData): UnsafeMapData = v.copy() } +private[columnar] object CALENDAR_INTERVAL extends ColumnType[CalendarInterval] + with DirectCopyColumnType[CalendarInterval] { + + override def dataType: DataType = CalendarIntervalType + + override def defaultSize: Int = 16 + + override def actualSize(row: InternalRow, ordinal: Int): Int = 20 + + override def getField(row: InternalRow, ordinal: Int): CalendarInterval = row.getInterval(ordinal) + + override def setField(row: InternalRow, ordinal: Int, value: CalendarInterval): Unit = { +row.setInterval(ordinal, value) + } + + override def extract(buffer: ByteBuffer): CalendarInterval = { +ByteBufferHelper.getInt(buffer) Review comment: Now I got why the actual size is 4 bytes larget than default size. We should document it. BTW can we skip storing the size? It's fixed anyway. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String)
HyukjinKwon commented on a change in pull request #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String) URL: https://github.com/apache/spark/pull/26903#discussion_r358070696 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ## @@ -524,6 +524,16 @@ class Dataset[T] private[sql]( /** * Prints the plans (logical and physical) with a format specified by a given explain mode. * + * @param mode specifies the expected output format of plans. + * + * `simple` Print only a physical plan. + * `extended`: Print both logical and physical plans. + * `codegen`: Print a physical plan and generated codes if they are + * available. + * `cost`: Print a logical plan and statistics if they are available. + * `formatted`: Split explain output into two sections: a physical plan outline + * and node details. + * Review comment: This is matched to Python side doc added FYI. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String)
HyukjinKwon opened a new pull request #26903: [SPARK-30200][DOCS][FOLLOW-UP] Add documentation for explain(mode: String) URL: https://github.com/apache/spark/pull/26903 ### What changes were proposed in this pull request? This PR adds the documentation of the new `mode` added to `Dataset.explain`. ### Why are the changes needed? To let users know the new modes. ### Does this PR introduce any user-facing change? No (doc-only change). ### How was this patch tested? Manually built the doc: ![Screen Shot 2019-12-16 at 3 34 28 PM](https://user-images.githubusercontent.com/6477701/70884617-d64f1680-2019-11ea-9336-247ade7f8768.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26699: [SPARK-30066][SQL] Support columnar execution on interval types
cloud-fan commented on a change in pull request #26699: [SPARK-30066][SQL] Support columnar execution on interval types URL: https://github.com/apache/spark/pull/26699#discussion_r358070565 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnType.scala ## @@ -705,6 +705,37 @@ private[columnar] case class MAP(dataType: MapType) override def clone(v: UnsafeMapData): UnsafeMapData = v.copy() } +private[columnar] object CALENDAR_INTERVAL extends ColumnType[CalendarInterval] + with DirectCopyColumnType[CalendarInterval] { + + override def dataType: DataType = CalendarIntervalType + + override def defaultSize: Int = 16 + + override def actualSize(row: InternalRow, ordinal: Int): Int = 20 + + override def getField(row: InternalRow, ordinal: Int): CalendarInterval = row.getInterval(ordinal) + + override def setField(row: InternalRow, ordinal: Int, value: CalendarInterval): Unit = { +row.setInterval(ordinal, value) + } + + override def extract(buffer: ByteBuffer): CalendarInterval = { +ByteBufferHelper.getInt(buffer) Review comment: the first int is the size? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26699: [SPARK-30066][SQL] Support columnar execution on interval types
cloud-fan commented on a change in pull request #26699: [SPARK-30066][SQL] Support columnar execution on interval types URL: https://github.com/apache/spark/pull/26699#discussion_r358070365 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnType.scala ## @@ -705,6 +705,37 @@ private[columnar] case class MAP(dataType: MapType) override def clone(v: UnsafeMapData): UnsafeMapData = v.copy() } +private[columnar] object CALENDAR_INTERVAL extends ColumnType[CalendarInterval] + with DirectCopyColumnType[CalendarInterval] { + + override def dataType: DataType = CalendarIntervalType + + override def defaultSize: Int = 16 + + override def actualSize(row: InternalRow, ordinal: Int): Int = 20 Review comment: why actual size is different from default size? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson commented on a change in pull request #26756: [SPARK-30119][WebUI]Support Pagination for Batch Tables in Streaming Tab
iRakson commented on a change in pull request #26756: [SPARK-30119][WebUI]Support Pagination for Batch Tables in Streaming Tab URL: https://github.com/apache/spark/pull/26756#discussion_r358069951 ## File path: streaming/src/main/scala/org/apache/spark/streaming/ui/AllBatchesTable.scala ## @@ -17,39 +17,122 @@ package org.apache.spark.streaming.ui -import scala.xml.Node +import java.net.URLEncoder +import java.nio.charset.StandardCharsets.UTF_8 +import javax.servlet.http.HttpServletRequest + +import scala.xml.{Node, Unparsed} + +import org.apache.spark.ui.{PagedDataSource, PagedTable, UIUtils => SparkUIUtils} + +private[ui] class StreamingBatchPagedTable( +request: HttpServletRequest, +parent: StreamingTab, +batchInterval: Long, +batchData: Seq[BatchUIData], +streamingBatchTag: String, +basePath: String, +subPath: String, +parameterOtherTable: Iterable[String], +pageSize: Int, +sortColumn: String, +desc: Boolean) extends PagedTable[BatchUIData] { + + override val dataSource = new StreamingBatchTableDataSource(batchData, pageSize, sortColumn, desc) + private val parameterPath = s"$basePath/$subPath/?${parameterOtherTable.mkString("&")}" + private val firstFailureReason = getFirstFailureReason(batchData) + + override def tableId: String = streamingBatchTag + + override def tableCssClass: String = +"table table-bordered table-condensed table-striped " + + "table-head-clickable table-cell-width-limited" + + override def pageLink(page: Int): String = { +val encodedSortColumn = URLEncoder.encode(sortColumn, UTF_8.name()) +parameterPath + + s"&$pageNumberFormField=$page" + + s"&$streamingBatchTag.sort=$encodedSortColumn" + + s"&$streamingBatchTag.desc=$desc" + + s"&$pageSizeFormField=$pageSize" + } -import org.apache.spark.ui.{UIUtils => SparkUIUtils} + override def pageSizeFormField: String = s"$streamingBatchTag.pageSize" -private[ui] abstract class BatchTableBase(tableId: String, batchInterval: Long) { + override def pageNumberFormField: String = s"$streamingBatchTag.page" - protected def columns: Seq[Node] = { -Batch Time - Records - Scheduling Delay -{SparkUIUtils.tooltip("Time taken by Streaming scheduler to submit jobs of a batch", "top")} - - Processing Time -{SparkUIUtils.tooltip("Time taken to process all jobs of a batch", "top")} + override def goButtonFormPath: String = { +val encodedSortColumn = URLEncoder.encode(sortColumn, UTF_8.name()) + s"$parameterPath&$streamingBatchTag.sort=$encodedSortColumn&$streamingBatchTag.desc=$desc" Review comment: I will add the `tableHeaderId` in the link. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics
cloud-fan commented on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics URL: https://github.com/apache/spark/pull/26899#issuecomment-565924368 > In fact, we only need to reserve -1 when doing min max statistics in SQLMetrics.stringValue But this PR seems changes the initial value to 0 for all cases? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson commented on a change in pull request #26756: [SPARK-30119][WebUI]Support Pagination for Batch Tables in Streaming Tab
iRakson commented on a change in pull request #26756: [SPARK-30119][WebUI]Support Pagination for Batch Tables in Streaming Tab URL: https://github.com/apache/spark/pull/26756#discussion_r358069458 ## File path: streaming/src/main/scala/org/apache/spark/streaming/ui/AllBatchesTable.scala ## @@ -17,39 +17,122 @@ package org.apache.spark.streaming.ui -import scala.xml.Node +import java.net.URLEncoder +import java.nio.charset.StandardCharsets.UTF_8 +import javax.servlet.http.HttpServletRequest + +import scala.xml.{Node, Unparsed} + +import org.apache.spark.ui.{PagedDataSource, PagedTable, UIUtils => SparkUIUtils} + +private[ui] class StreamingBatchPagedTable( +request: HttpServletRequest, +parent: StreamingTab, +batchInterval: Long, +batchData: Seq[BatchUIData], +streamingBatchTag: String, +basePath: String, +subPath: String, +parameterOtherTable: Iterable[String], +pageSize: Int, +sortColumn: String, +desc: Boolean) extends PagedTable[BatchUIData] { + + override val dataSource = new StreamingBatchTableDataSource(batchData, pageSize, sortColumn, desc) + private val parameterPath = s"$basePath/$subPath/?${parameterOtherTable.mkString("&")}" + private val firstFailureReason = getFirstFailureReason(batchData) + + override def tableId: String = streamingBatchTag + + override def tableCssClass: String = +"table table-bordered table-condensed table-striped " + + "table-head-clickable table-cell-width-limited" + + override def pageLink(page: Int): String = { +val encodedSortColumn = URLEncoder.encode(sortColumn, UTF_8.name()) +parameterPath + + s"&$pageNumberFormField=$page" + + s"&$streamingBatchTag.sort=$encodedSortColumn" + + s"&$streamingBatchTag.desc=$desc" + + s"&$pageSizeFormField=$pageSize" + } -import org.apache.spark.ui.{UIUtils => SparkUIUtils} + override def pageSizeFormField: String = s"$streamingBatchTag.pageSize" -private[ui] abstract class BatchTableBase(tableId: String, batchInterval: Long) { + override def pageNumberFormField: String = s"$streamingBatchTag.page" - protected def columns: Seq[Node] = { -Batch Time - Records - Scheduling Delay -{SparkUIUtils.tooltip("Time taken by Streaming scheduler to submit jobs of a batch", "top")} - - Processing Time -{SparkUIUtils.tooltip("Time taken to process all jobs of a batch", "top")} + override def goButtonFormPath: String = { +val encodedSortColumn = URLEncoder.encode(sortColumn, UTF_8.name()) + s"$parameterPath&$streamingBatchTag.sort=$encodedSortColumn&$streamingBatchTag.desc=$desc" } - /** - * Return the first failure reason if finding in the batches. - */ - protected def getFirstFailureReason(batches: Seq[BatchUIData]): Option[String] = { -batches.flatMap(_.outputOperations.flatMap(_._2.failureReason)).headOption - } - - protected def getFirstFailureTableCell(batch: BatchUIData): Seq[Node] = { -val firstFailureReason = batch.outputOperations.flatMap(_._2.failureReason).headOption -firstFailureReason.map { failureReason => - val failureReasonForUI = UIUtils.createOutputOperationFailureForUI(failureReason) - UIUtils.failureReasonCell( -failureReasonForUI, rowspan = 1, includeFirstLineInExpandDetails = false) -}.getOrElse(-) + override def headers: Seq[Node] = { +val completedBatchTableHeaders = Seq("Batch Time", "Records", "Scheduling Delay", Review comment: Both the tables are identical i.e. the schema is same for both. So headers will remain same for both. But yeah, `completedBatchTableHeaders` is misleading. So, i will update the variable name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson commented on a change in pull request #26756: [SPARK-30119][WebUI]Support Pagination for Batch Tables in Streaming Tab
iRakson commented on a change in pull request #26756: [SPARK-30119][WebUI]Support Pagination for Batch Tables in Streaming Tab URL: https://github.com/apache/spark/pull/26756#discussion_r358068966 ## File path: streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala ## @@ -482,11 +484,61 @@ private[ui] class StreamingPage(parent: StreamingTab) } - private def generateBatchListTables(): Seq[Node] = { + private def streamingBatchTable( + request: HttpServletRequest, + batchData: Seq[BatchUIData], + streamingBatchTag: String, + batchInterval: Long): Seq[Node] = { +val parameterOtherTable = request.getParameterMap.asScala + .filterNot(_._1.startsWith(streamingBatchTag)) + .map { case (name, vals) => +name + "=" + vals(0) + } + +val parameterStreamingBatchPage = request.getParameter(streamingBatchTag + ".page") +val parameterStreamingBatchSortColumn = request.getParameter(streamingBatchTag + ".sort") +val parameterStreamingBatchSortDesc = request.getParameter(streamingBatchTag + ".desc") +val parameterStreamingBatchPageSize = request.getParameter(streamingBatchTag + ".pageSize") +val streamingBatchPage = Option(parameterStreamingBatchPage).map(_.toInt).getOrElse(1) +val streamingBatchSortColumn = Option(parameterStreamingBatchSortColumn).map { sortColumn => + SparkUIUtils.decodeURLParameter(sortColumn) +}.getOrElse("Batch Time") +val streamingBatchSortDesc = Option(parameterStreamingBatchSortDesc).map(_.toBoolean).getOrElse( + streamingBatchSortColumn == "Batch Time" +) +val streamingBatchPageSize = Option(parameterStreamingBatchPageSize).map(_.toInt).getOrElse(100) + +try { + new StreamingBatchPagedTable( +request, +parent, +batchInterval, +batchData, +streamingBatchTag, +SparkUIUtils.prependBaseUri(request, parent.basePath), +"streaming", // subPath +parameterOtherTable, +pageSize = streamingBatchPageSize, +sortColumn = streamingBatchSortColumn, +desc = streamingBatchSortDesc + ).table(streamingBatchPage) +} catch { + case e @ (_ : IllegalArgumentException | _ : IndexOutOfBoundsException) => + + Error while rendering job table: + +{Utils.exceptionString(e)} + + +} + } + + private def generateBatchListTables(request: HttpServletRequest): Seq[Node] = { val runningBatches = listener.runningBatches.sortBy(_.batchTime.milliseconds).reverse val waitingBatches = listener.waitingBatches.sortBy(_.batchTime.milliseconds).reverse val completedBatches = listener.retainedCompletedBatches. sortBy(_.batchTime.milliseconds).reverse +val activeBatchData = waitingBatches ++ runningBatches Review comment: Previously, running batches and waiting batches were shown in the same table (Active Batches Table) too. To ensure that property only i appended the data of both running batches and waiting batches. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh`
wangyum commented on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh` URL: https://github.com/apache/spark/pull/26902#issuecomment-565921324 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum closed pull request #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh`
wangyum closed pull request #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh` URL: https://github.com/apache/spark/pull/26902 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh`
wangyum commented on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh` URL: https://github.com/apache/spark/pull/26902#issuecomment-565921177 I'm merging it and will be releasing v3.0.0-preview2 soon. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes
HyukjinKwon commented on a change in pull request #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes URL: https://github.com/apache/spark/pull/26898#discussion_r358065239 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ## @@ -522,53 +521,20 @@ class Dataset[T] private[sql]( def printSchema(level: Int): Unit = println(schema.treeString(level)) // scalastyle:on println - private def toExplainString(mode: ExplainMode): String = { -// Because temporary views are resolved during analysis when we create a Dataset, and -// `ExplainCommand` analyzes input query plan and resolves temporary views again. Using -// `ExplainCommand` here will probably output different query plans, compared to the results -// of evaluation of the Dataset. So just output QueryExecution's query plans here. -val qe = ExplainCommandUtil.explainedQueryExecution(sparkSession, logicalPlan, queryExecution) - -mode match { - case ExplainMode.Simple => -qe.simpleString - case ExplainMode.Extended => -qe.toString - case ExplainMode.Codegen => -try { - org.apache.spark.sql.execution.debug.codegenString(queryExecution.executedPlan) -} catch { - case e: AnalysisException => e.toString -} - case ExplainMode.Cost => -qe.stringWithStats - case ExplainMode.Formatted => -qe.simpleString(formatted = true) -} - } - - // This method intends to be called from PySpark DataFrame - private[sql] def toExplainString(mode: String): String = { -mode.toLowerCase(Locale.ROOT) match { - case "simple" => toExplainString(ExplainMode.Simple) - case "extended" => toExplainString(ExplainMode.Extended) - case "codegen" => toExplainString(ExplainMode.Codegen) - case "cost" => toExplainString(ExplainMode.Cost) - case "formatted" => toExplainString(ExplainMode.Formatted) - case _ => throw new IllegalArgumentException(s"Unknown explain mode: $mode. Accepted " + -"explain modes are 'simple', 'extended', 'codegen', 'cost', 'formatted'.") -} - } - /** * Prints the plans (logical and physical) with a format specified by a given explain mode. Review comment: Sure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics
AmplabJenkins removed a comment on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics URL: https://github.com/apache/spark/pull/26899#issuecomment-565919210 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20183/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics
AmplabJenkins removed a comment on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics URL: https://github.com/apache/spark/pull/26899#issuecomment-565919203 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics
AmplabJenkins commented on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics URL: https://github.com/apache/spark/pull/26899#issuecomment-565919203 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics
AmplabJenkins commented on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics URL: https://github.com/apache/spark/pull/26899#issuecomment-565919210 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20183/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics
SparkQA commented on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics URL: https://github.com/apache/spark/pull/26899#issuecomment-565918944 **[Test build #115377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115377/testReport)** for PR 26899 at commit [`f95a70e`](https://github.com/apache/spark/commit/f95a70e59274d84667bfd21a4ac0aed82b982a25). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics
cloud-fan commented on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics URL: https://github.com/apache/spark/pull/26899#issuecomment-565918538 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics
AmplabJenkins removed a comment on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics URL: https://github.com/apache/spark/pull/26899#issuecomment-565813077 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26887: [SPARK-30259][SQL] Fix CREATE TABLE behavior when session catalog is specified explicitly
cloud-fan commented on issue #26887: [SPARK-30259][SQL] Fix CREATE TABLE behavior when session catalog is specified explicitly URL: https://github.com/apache/spark/pull/26887#issuecomment-565918083 good catch! late LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #23943: [SPARK-27034][SQL] Nested schema pruning for ORC
gatorsmile commented on a change in pull request #23943: [SPARK-27034][SQL] Nested schema pruning for ORC URL: https://github.com/apache/spark/pull/23943#discussion_r358063793 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1541,8 +1541,8 @@ object SQLConf { .internal() .doc("Prune nested fields from a logical relation's output which are unnecessary in " + "satisfying a query. This optimization allows columnar file format readers to avoid " + -"reading unnecessary nested column data. Currently Parquet is the only data source that " + -"implements this optimization.") +"reading unnecessary nested column data. Currently Parquet and ORC v1 are the " + +"data sources that implement this optimization.") .booleanConf .createWithDefault(false) Review comment: @dbtsai @dongjoon-hyun We turned on this flag by default in the upcoming 3.0 because Apple has tried this in the production in the last few months. I am wondering if that statement also includes ORC nested schema pruning? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh`
AmplabJenkins removed a comment on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh` URL: https://github.com/apache/spark/pull/26902#issuecomment-565917522 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20182/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26831: [SPARK-30201][SQL] HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT
cloud-fan commented on a change in pull request #26831: [SPARK-30201][SQL] HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT URL: https://github.com/apache/spark/pull/26831#discussion_r358063675 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ## @@ -305,12 +305,17 @@ private[hive] trait HiveInspectors { withNullSafe(o => getByteWritable(o)) case _: ByteObjectInspector => withNullSafe(o => o.asInstanceOf[java.lang.Byte]) - case _: JavaHiveVarcharObjectInspector => Review comment: do you mean `JavaHiveVarcharObjectInspector` extends `HiveVarcharObjectInspector`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh`
AmplabJenkins removed a comment on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh` URL: https://github.com/apache/spark/pull/26902#issuecomment-565917516 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh`
AmplabJenkins commented on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh` URL: https://github.com/apache/spark/pull/26902#issuecomment-565917522 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20182/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh`
AmplabJenkins commented on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh` URL: https://github.com/apache/spark/pull/26902#issuecomment-565917516 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes
cloud-fan commented on a change in pull request #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes URL: https://github.com/apache/spark/pull/26898#discussion_r358063371 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ## @@ -522,53 +521,20 @@ class Dataset[T] private[sql]( def printSchema(level: Int): Unit = println(schema.treeString(level)) // scalastyle:on println - private def toExplainString(mode: ExplainMode): String = { -// Because temporary views are resolved during analysis when we create a Dataset, and -// `ExplainCommand` analyzes input query plan and resolves temporary views again. Using -// `ExplainCommand` here will probably output different query plans, compared to the results -// of evaluation of the Dataset. So just output QueryExecution's query plans here. -val qe = ExplainCommandUtil.explainedQueryExecution(sparkSession, logicalPlan, queryExecution) - -mode match { - case ExplainMode.Simple => -qe.simpleString - case ExplainMode.Extended => -qe.toString - case ExplainMode.Codegen => -try { - org.apache.spark.sql.execution.debug.codegenString(queryExecution.executedPlan) -} catch { - case e: AnalysisException => e.toString -} - case ExplainMode.Cost => -qe.stringWithStats - case ExplainMode.Formatted => -qe.simpleString(formatted = true) -} - } - - // This method intends to be called from PySpark DataFrame - private[sql] def toExplainString(mode: String): String = { -mode.toLowerCase(Locale.ROOT) match { - case "simple" => toExplainString(ExplainMode.Simple) - case "extended" => toExplainString(ExplainMode.Extended) - case "codegen" => toExplainString(ExplainMode.Codegen) - case "cost" => toExplainString(ExplainMode.Cost) - case "formatted" => toExplainString(ExplainMode.Formatted) - case _ => throw new IllegalArgumentException(s"Unknown explain mode: $mode. Accepted " + -"explain modes are 'simple', 'extended', 'codegen', 'cost', 'formatted'.") -} - } - /** * Prints the plans (logical and physical) with a format specified by a given explain mode. Review comment: Shall we at least document the acceptable explain mode string here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh`
SparkQA commented on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh` URL: https://github.com/apache/spark/pull/26902#issuecomment-565917197 **[Test build #115376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115376/testReport)** for PR 26902 at commit [`a07a926`](https://github.com/apache/spark/commit/a07a9265f1ed0ce3559ba8863c46eaf1d068dc52). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
huaxingao commented on a change in pull request #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#discussion_r358063126 ## File path: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ## @@ -229,4 +229,17 @@ class MultilayerPerceptronClassifierSuite extends MLTest with DefaultReadWriteTe assert(expected.weights === actual.weights) } } + + test("Load MultilayerPerceptronClassificationModel prior to Spark 3.0") { Review comment: so add similar test in ```NaiveBayes``` (and other algorithms that modified the load/save method) too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh`
wangyum commented on issue #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh` URL: https://github.com/apache/spark/pull/26902#issuecomment-565916254 cc @dongjoon-hyun @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum opened a new pull request #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh`
wangyum opened a new pull request #26902: Revert "[SPARK-30056][INFRA] Skip building test artifacts in `dev/make-distribution.sh` URL: https://github.com/apache/spark/pull/26902 #15267 # What changes were proposed in this pull request? This reverts commit 7c0ce285. ### Why are the changes needed? Failed to make distribution: ``` [INFO] -< org.apache.spark:spark-sketch_2.12 >- [INFO] Building Spark Project Sketch 3.0.0-preview2 [3/33] [INFO] [ jar ]- [INFO] Downloading from central: https://repo.maven.apache.org/maven2/org/apache/spark/spark-tags_2.12/3.0.0-preview2/spark-tags_2.12-3.0.0-preview2-tests.jar [INFO] [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-preview2: [INFO] [INFO] Spark Project Parent POM ... SUCCESS [ 26.513 s] [INFO] Spark Project Tags . SUCCESS [ 48.393 s] [INFO] Spark Project Sketch ... FAILURE [ 0.034 s] [INFO] Spark Project Local DB . SKIPPED [INFO] Spark Project Networking ... SKIPPED [INFO] Spark Project Shuffle Streaming Service SKIPPED [INFO] Spark Project Unsafe ... SKIPPED [INFO] Spark Project Launcher . SKIPPED [INFO] Spark Project Core . SKIPPED [INFO] Spark Project ML Local Library . SKIPPED [INFO] Spark Project GraphX ... SKIPPED [INFO] Spark Project Streaming SKIPPED [INFO] Spark Project Catalyst . SKIPPED [INFO] Spark Project SQL .. SKIPPED [INFO] Spark Project ML Library ... SKIPPED [INFO] Spark Project Tools SKIPPED [INFO] Spark Project Hive . SKIPPED [INFO] Spark Project Graph API SKIPPED [INFO] Spark Project Cypher ... SKIPPED [INFO] Spark Project Graph SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project YARN Shuffle Service . SKIPPED [INFO] Spark Project YARN . SKIPPED [INFO] Spark Project Mesos SKIPPED [INFO] Spark Project Kubernetes ... SKIPPED [INFO] Spark Project Hive Thrift Server ... SKIPPED [INFO] Spark Project Assembly . SKIPPED [INFO] Kafka 0.10+ Token Provider for Streaming ... SKIPPED [INFO] Spark Integration for Kafka 0.10 ... SKIPPED [INFO] Kafka 0.10+ Source for Structured Streaming SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Integration for Kafka 0.10 Assembly .. SKIPPED [INFO] Spark Avro . SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 01:15 min [INFO] Finished at: 2019-12-16T05:29:43Z [INFO] [ERROR] Failed to execute goal on project spark-sketch_2.12: Could not resolve dependencies for project org.apache.spark:spark-sketch_2.12:jar:3.0.0-preview2: Could not find artifact org.apache.spark:spark-tags_2.12:jar:tests:3.0.0-preview2 in central (https://repo.maven.apache.org/maven2) -> [Help 1] [ERROR] ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? manual test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
huaxingao commented on a change in pull request #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#discussion_r358061923 ## File path: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ## @@ -229,4 +229,17 @@ class MultilayerPerceptronClassifierSuite extends MLTest with DefaultReadWriteTe assert(expected.weights === actual.weights) } } + + test("Load MultilayerPerceptronClassificationModel prior to Spark 3.0") { Review comment: I am ok either way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#issuecomment-565914120 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115374/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#issuecomment-565914113 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
cloud-fan commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#discussion_r358060612 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -552,4 +577,155 @@ class AdaptiveQueryExecSuite spark.sparkContext.removeSparkListener(listener) } } + + test("adaptive skew join both in left and right for inner join ") { +withSQLConf( + SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true", + SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1", + SQLConf.ADAPTIVE_EXECUTION_SKEWED_PARTITION_FACTOR.key -> "1", + SQLConf.ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD.key -> "100", + SQLConf.SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key -> "2000") { + val (plan, adaptivePlan) = runAdaptiveAndVerifyResult( +"SELECT * FROM skewData1 join skewData2 ON key1 = key2") + val smj = findTopLevelSortMergeJoin(plan) + assert(smj.size == 1) + // left stats: [4403, 0, 1927, 1927, 1927] Review comment: I'm a little surprised that partition 1 is skewed. Isn't `key1` uniformly distributed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
SparkQA removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#issuecomment-565895997 **[Test build #115374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115374/testReport)** for PR 26838 at commit [`f98de6b`](https://github.com/apache/spark/commit/f98de6bf7cdbb31f7fc402b8010f63a10188d535). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#issuecomment-565914113 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#issuecomment-565914120 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115374/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes
HyukjinKwon closed pull request #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes URL: https://github.com/apache/spark/pull/26898 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes
AmplabJenkins commented on issue #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes URL: https://github.com/apache/spark/pull/26898#issuecomment-565913855 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115371/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes
AmplabJenkins removed a comment on issue #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes URL: https://github.com/apache/spark/pull/26898#issuecomment-565913855 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115371/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
SparkQA commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#issuecomment-565913975 **[Test build #115374 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115374/testReport)** for PR 26838 at commit [`f98de6b`](https://github.com/apache/spark/commit/f98de6bf7cdbb31f7fc402b8010f63a10188d535). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes
HyukjinKwon commented on issue #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes URL: https://github.com/apache/spark/pull/26898#issuecomment-565913928 Thanks guys. Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes
AmplabJenkins commented on issue #26898: [SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes URL: https://github.com/apache/spark/pull/26898#issuecomment-565913851 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org