date:20200217

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27499: [SPARK-30590][SQL] Untyped select API cannot take typed column expression

2020-02-17 Thread GitBox

HyukjinKwon commented on a change in pull request #27499: [SPARK-30590][SQL] 
Untyped select API cannot take typed column expression
URL: https://github.com/apache/spark/pull/27499#discussion_r380507471
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
 ##
 @@ -394,4 +403,21 @@ class DatasetAggregatorSuite extends QueryTest with 
SharedSparkSession {
 checkAnswer(group, Row("bob", Row(true, 3)) :: Nil)
 checkDataset(group.as[OptionBooleanIntData], OptionBooleanIntData("bob", 
Some((true, 3
   }
+
+  test("SPARK-30590: select multiple typed column expressions") {
+val df = Seq((1, 2, 3, 4, 5, 6)).toDF("a", "b", "c", "d", "e", "f")
+val fooAgg = (i: Int) => FooAgg(i).toColumn.name(s"foo_agg_$i")
+
+val agg1 = df.select(fooAgg(1), fooAgg(2), fooAgg(3), fooAgg(4), fooAgg(5))
+checkDataset(agg1, (3, 5, 7, 9, 11))
+
+val agg2 = df.selectUntyped(fooAgg(1), fooAgg(2), fooAgg(3), fooAgg(4), 
fooAgg(5), fooAgg(6))
+  .asInstanceOf[Dataset[(Int, Int, Int, Int, Int, Int)]]
+checkDataset(agg2, (3, 5, 7, 9, 11, 13))
+
+val err = intercept[AnalysisException] {
+  df.select(fooAgg(1), fooAgg(2), fooAgg(3), fooAgg(4), fooAgg(5), 
fooAgg(6))
 
 Review comment:
   Current behaviour seems throwing an exception (as described in the JIRA).
   
   ```
   scala> 
df.select(fooAgg(1),fooAgg(2),fooAgg(3),fooAgg(4),fooAgg(5),fooAgg(6)).show
   
   org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate 
[fooagg(FooAgg(1), None, None, None, input[0, int, false] AS value#114, 
assertnotnull(cast(value#114 as int)), input[0, int, false] AS value#113, 
IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), None, 
None, None, input[0, int, false] AS value#119, assertnotnull(cast(value#119 as 
int)), input[0, int, false] AS value#118, IntegerType, IntegerType, false) AS 
foo_agg_2#121, fooagg(FooAgg(3), None, None, None, input[0, int, false] AS 
value#124, assertnotnull(cast(value#124 as int)), input[0, int, false] AS 
value#123, IntegerType, IntegerType, false) AS foo_agg_3#126, fooagg(FooAgg(4), 
None, None, None, input[0, int, false] AS value#129, 
assertnotnull(cast(value#129 as int)), input[0, int, false] AS value#128, 
IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), None, 
None, None, input[0, int, false] AS value#134, assertnotnull(cast(value#134 as 
int)), input[0, int, false] AS value#133, IntegerType, IntegerType, false) AS 
foo_agg_5#136, fooagg(FooAgg(6), None, None, None, input[0, int, false] AS 
value#139, assertnotnull(cast(value#139 as int)), input[0, int, false] AS 
value#138, IntegerType, IntegerType, false) AS foo_agg_6#141];;
   'Aggregate [fooagg(FooAgg(1), None, None, None, input[0, int, false] AS 
value#114, assertnotnull(cast(value#114 as int)), input[0, int, false] AS 
value#113, IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), 
None, None, None, input[0, int, false] AS value#119, 
assertnotnull(cast(value#119 as int)), input[0, int, false] AS value#118, 
IntegerType, IntegerType, false) AS foo_agg_2#121, fooagg(FooAgg(3), None, 
None, None, input[0, int, false] AS value#124, assertnotnull(cast(value#124 as 
int)), input[0, int, false] AS value#123, IntegerType, IntegerType, false) AS 
foo_agg_3#126, fooagg(FooAgg(4), None, None, None, input[0, int, false] AS 
value#129, assertnotnull(cast(value#129 as int)), input[0, int, false] AS 
value#128, IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), 
None, None, None, input[0, int, false] AS value#134, 
assertnotnull(cast(value#134 as int)), input[0, int, false] AS value#133, 
IntegerType, IntegerType, false) AS foo_agg_5#136, fooagg(FooAgg(6), None, 
None, None, input[0, int, false] AS value#139, assertnotnull(cast(value#139 as 
int)), input[0, int, false] AS value#138, IntegerType, IntegerType, false) AS 
foo_agg_6#141]
   +- Project [_1#6 AS a#13, _2#7 AS b#14, _3#8 AS c#15, _4#9 AS d#16, _5#10 AS 
e#17, _6#11 AS F#18]
+- LocalRelation [_1#6, _2#7, _3#8, _4#9, _5#10, _6#11]
   
   at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:43)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:431)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:430)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:430)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:95)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:108)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyz

[GitHub] [spark] gatorsmile commented on issue #27477: [SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators

2020-02-17 Thread GitBox

gatorsmile commented on issue #27477: [SPARK-30724][SQL] Support 'LIKE ANY' and 
'LIKE ALL' operators
URL: https://github.com/apache/spark/pull/27477#issuecomment-587328326
 
 
   This would be good to have since both Teradata and Snowflake support it. 
   
   cc @maryannxue @hvanhovell @cloud-fan 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString

2020-02-17 Thread GitBox

cloud-fan commented on a change in pull request #27608: [SPARK-30863][SQL] 
Distinguish Cast and AnsiCast in toString
URL: https://github.com/apache/spark/pull/27608#discussion_r380504333
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ##
 @@ -251,7 +251,10 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
 
   def dataType: DataType
 
-  override def toString: String = s"cast($child as ${dataType.simpleString})"
+  override def toString: String = {
+val ansi = if (ansiEnabled) "ansi_" else ""
 
 Review comment:
   This follows the existing SQL function naming convention, e.g. 
`FromUTCTimestamp`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27613: [SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation.

2020-02-17 Thread GitBox

HyukjinKwon commented on issue #27613: [SPARK-30859][PYSPARK][DOCS][MINOR] 
Fixed docstring syntax issues preventing proper compilation of documentation.
URL: https://github.com/apache/spark/pull/27613#issuecomment-587325164
 
 
   Merged to master and branch-3.0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #27613: [SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation.

2020-02-17 Thread GitBox

HyukjinKwon closed pull request #27613: [SPARK-30859][PYSPARK][DOCS][MINOR] 
Fixed docstring syntax issues preventing proper compilation of documentation.
URL: https://github.com/apache/spark/pull/27613
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a change in pull request #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString

2020-02-17 Thread GitBox

gengliangwang commented on a change in pull request #27608: [SPARK-30863][SQL] 
Distinguish Cast and AnsiCast in toString
URL: https://github.com/apache/spark/pull/27608#discussion_r380501956
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ##
 @@ -251,7 +251,10 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
 
   def dataType: DataType
 
-  override def toString: String = s"cast($child as ${dataType.simpleString})"
+  override def toString: String = {
+val ansi = if (ansiEnabled) "ansi_" else ""
 
 Review comment:
   I think it should be "ansiCast" or "ANSICast" instead of "ansi_cast"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27402: [SPARK-30679][SQL] REPLACE 
TABLE can omit the USING clause
URL: https://github.com/apache/spark/pull/27402#issuecomment-587322132
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23372/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27402: [SPARK-30679][SQL] REPLACE 
TABLE can omit the USING clause
URL: https://github.com/apache/spark/pull/27402#issuecomment-587322122
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can 
omit the USING clause
URL: https://github.com/apache/spark/pull/27402#issuecomment-587322132
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23372/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can 
omit the USING clause
URL: https://github.com/apache/spark/pull/27402#issuecomment-587322122
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause

2020-02-17 Thread GitBox

SparkQA commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit 
the USING clause
URL: https://github.com/apache/spark/pull/27402#issuecomment-587321807
 
 
   **[Test build #118619 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118619/testReport)**
 for PR 27402 at commit 
[`647851c`](https://github.com/apache/spark/commit/647851c3b6a76f49f9546bbd364202b9fa2d9228).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ScrapCodes closed pull request #27520: [SPARK-30771][K8S] Avoid failed mount warning from kubernetes and support the optional mount.

2020-02-17 Thread GitBox

ScrapCodes closed pull request #27520: [SPARK-30771][K8S] Avoid failed mount 
warning from kubernetes and support the optional mount.
URL: https://github.com/apache/spark/pull/27520
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ScrapCodes commented on a change in pull request #27520: [SPARK-30771][K8S] Avoid failed mount warning from kubernetes and support the optional mount.

2020-02-17 Thread GitBox

ScrapCodes commented on a change in pull request #27520: [SPARK-30771][K8S] 
Avoid failed mount warning from kubernetes and support the optional mount.
URL: https://github.com/apache/spark/pull/27520#discussion_r380498148
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 ##
 @@ -127,15 +127,18 @@ private[spark] class Client(
 .pods()
 .withName(driverPodName)
 .watch(watcher)) { _ =>
-  val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod)
+  var createdDriverPod: Option[Pod] = None
   try {
 val otherKubernetesResources =
   resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
-addDriverOwnerReference(createdDriverPod, otherKubernetesResources)
 kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
+createdDriverPod = 
Some(kubernetesClient.pods().create(resolvedDriverPod))
+addDriverOwnerReference(createdDriverPod.get, otherKubernetesResources)
 
 Review comment:
   @liyinan926 Thanks for helping me think through this, so far whatever 
solution I could think, I have no way to solve this problem without introducing 
another problem or risks. I will keep thinking, and if you get any hints, 
please help me with it. In the meantime, I am closing this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27402: [SPARK-30679][SQL] REPLACE 
TABLE can omit the USING clause
URL: https://github.com/apache/spark/pull/27402#issuecomment-581236680
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117758/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause

2020-02-17 Thread GitBox

HyukjinKwon commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can 
omit the USING clause
URL: https://github.com/apache/spark/pull/27402#issuecomment-587319958
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #22038: [SPARK-25056][SQL] Unify the InConversion and BinaryComparison behavior

2020-02-17 Thread GitBox

HyukjinKwon commented on a change in pull request #22038: [SPARK-25056][SQL] 
Unify the InConversion and BinaryComparison behavior
URL: https://github.com/apache/spark/pull/22038#discussion_r380497149
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ##
 @@ -491,9 +491,13 @@ object TypeCoercion {
   i
 }
 
-  case i @ In(a, b) if b.exists(_.dataType != a.dataType) =>
-findWiderCommonType(i.children.map(_.dataType)) match {
-  case Some(finalDataType) => i.withNewChildren(i.children.map(Cast(_, 
finalDataType)))
+  case i @ In(value, list) if list.exists(_.dataType != value.dataType) =>
 
 Review comment:
   @wangyum, are we able to add a legacy configuration? with it, I think it's 
good to go.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

SparkQA removed a comment on issue #27616: [SPARK-30864] [SQL]add the user 
guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587314887
 
 
   **[Test build #118618 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118618/testReport)**
 for PR 27616 at commit 
[`727f57f`](https://github.com/apache/spark/commit/727f57f1bfba53a486f87776646d160eb8061258).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the 
user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587319486
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide 
for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587319486
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide 
for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587319492
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118618/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for 
Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587319348
 
 
   **[Test build #118618 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118618/testReport)**
 for PR 27616 at commit 
[`727f57f`](https://github.com/apache/spark/commit/727f57f1bfba53a486f87776646d160eb8061258).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the 
user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587319492
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118618/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27608: [SPARK-30863][SQL] Distinguish 
Cast and AnsiCast in toString
URL: https://github.com/apache/spark/pull/27608#issuecomment-587318920
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118607/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27608: [SPARK-30863][SQL] Distinguish 
Cast and AnsiCast in toString
URL: https://github.com/apache/spark/pull/27608#issuecomment-587318915
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27608: [SPARK-30863][SQL] Distinguish Cast 
and AnsiCast in toString
URL: https://github.com/apache/spark/pull/27608#issuecomment-587318920
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118607/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27608: [SPARK-30863][SQL] Distinguish Cast 
and AnsiCast in toString
URL: https://github.com/apache/spark/pull/27608#issuecomment-587318915
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString

2020-02-17 Thread GitBox

SparkQA removed a comment on issue #27608: [SPARK-30863][SQL] Distinguish Cast 
and AnsiCast in toString
URL: https://github.com/apache/spark/pull/27608#issuecomment-587250020
 
 
   **[Test build #118607 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118607/testReport)**
 for PR 27608 at commit 
[`4896fb5`](https://github.com/apache/spark/commit/4896fb56a4fa600fdba3e9c3600a9adb2effc792).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString

2020-02-17 Thread GitBox

SparkQA commented on issue #27608: [SPARK-30863][SQL] Distinguish Cast and 
AnsiCast in toString
URL: https://github.com/apache/spark/pull/27608#issuecomment-587318252
 
 
   **[Test build #118607 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118607/testReport)**
 for PR 27608 at commit 
[`4896fb5`](https://github.com/apache/spark/commit/4896fb56a4fa600fdba3e9c3600a9adb2effc792).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JkSelf commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

JkSelf commented on a change in pull request #27616: [SPARK-30864] [SQL]add the 
user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#discussion_r380492640
 
 

 ##
 File path: docs/sql-performance-tuning.md
 ##
 @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names 
and a partition number is
 SELECT /*+ REPARTITION(3, c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
+
+## Adaptive Query Execution
+Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that 
make use of the runtime statistics to choose the most efficient query execution 
plan. AQE is disabled by default. Spark SQL can use the umbrella configuration 
of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are 
three mainly feature in AQE, including coalescing post partition number, 
optimizing local shuffle reader and optimizing skewed join.
+ ### Coalescing Post Shuffle Partition Num
+ This feature coalesces the post shuffle partitions based on the map output 
statistics when `spark.sql.adaptive.enabled` and 
`spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled` configuration 
properties are both enabled. There are four following sub-configurations in 
this optimization rule. 
+ 
+   Property NameDefaultMeaning
+   
+ 
spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled
+ true
+ 
+   When true and spark.sql.adaptive.enabled is enabled, spark 
will reduce the post shuffle partitions number based on the map output 
statistics.
+ 
+   
+   
+ 
spark.sql.adaptive.shuffle.minNumPostShufflePartitions
+ 1
+ 
+   The advisory minimum number of post-shuffle partitions used when 
spark.sql.adaptive.enabled and 
spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled are 
both enabled. It is suggested to be almost 2~3x of the parallelism when doing 
benchmark.
+ 
+   
+   
+ 
spark.sql.adaptive.shuffle.maxNumPostShufflePartitions
+ Int.MaxValue
+ 
+   The advisory maximum number of post-shuffle partitions used in adaptive 
execution. This is used as the initial number of pre-shuffle partitions. By 
default it equals to spark.sql.shuffle.partitions.
+ 
+   
+   
+ 
spark.sql.adaptive.shuffle.targetPostShuffleInputSize
+ 67108864 (64 MB)
+ 
+   The target post-shuffle input size in bytes of a task when 
spark.sql.adaptive.enabled and 
spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled are 
both enabled.
+ 
+   
+ 
+ 
+ ### Optimize Local Shuffle Reader
+ This feature optimize the shuffle reader to local shuffle reader when 
converting the sort merge join to broadcast hash join in runtime and no 
additional shuffle introduced. It takes effect when 
`spark.sql.adaptive.enabled` and 
`spark.sql.adaptive.shuffle.localShuffleReader.enabled` configuration 
properties are both enabled.
 
 Review comment:
   add the performance data both three features.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the 
user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587315297
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23371/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the 
user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587315290
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide 
for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587315297
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23371/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide 
for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587315290
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JkSelf commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

JkSelf commented on a change in pull request #27616: [SPARK-30864] [SQL]add the 
user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#discussion_r380492530
 
 

 ##
 File path: docs/sql-performance-tuning.md
 ##
 @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names 
and a partition number is
 SELECT /*+ REPARTITION(3, c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
+
+## Adaptive Query Execution
+Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that 
make use of the runtime statistics to choose the most efficient query execution 
plan. AQE is disabled by default. Spark SQL can use the umbrella configuration 
of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are 
three mainly feature in AQE, including coalescing post partition number, 
optimizing local shuffle reader and optimizing skewed join.
 
 Review comment:
   updated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for 
Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587314887
 
 
   **[Test build #118618 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118618/testReport)**
 for PR 27616 at commit 
[`727f57f`](https://github.com/apache/spark/commit/727f57f1bfba53a486f87776646d160eb8061258).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27129: [SPARK-30427][SQL] Add config item for limiting partition number when calculating statistics through File System

2020-02-17 Thread GitBox

HyukjinKwon commented on a change in pull request #27129: [SPARK-30427][SQL] 
Add config item for limiting partition number when calculating statistics 
through File System
URL: https://github.com/apache/spark/pull/27129#discussion_r380486918
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala
 ##
 @@ -73,19 +74,30 @@ private[sql] class PruneHiveTablePartitions(session: 
SparkSession)
   private def updateTableMeta(
   tableMeta: CatalogTable,
   prunedPartitions: Seq[CatalogTablePartition]): CatalogTable = {
-val sizeOfPartitions = prunedPartitions.map { partition =>
+val partitionsWithSize = prunedPartitions.map { partition =>
   val rawDataSize = 
partition.parameters.get(StatsSetupConst.RAW_DATA_SIZE).map(_.toLong)
   val totalSize = 
partition.parameters.get(StatsSetupConst.TOTAL_SIZE).map(_.toLong)
   if (rawDataSize.isDefined && rawDataSize.get > 0) {
-rawDataSize.get
+(partition, rawDataSize.get)
   } else if (totalSize.isDefined && totalSize.get > 0L) {
-totalSize.get
+(partition, totalSize.get)
   } else {
-0L
+(partition, 0L)
   }
 }
-if (sizeOfPartitions.forall(_ > 0)) {
-  val sizeInBytes = sizeOfPartitions.sum
+if (partitionsWithSize.forall(_._2 > 0)) {
+  val sizeInBytes = partitionsWithSize.map(_._2).sum
+  tableMeta.copy(stats = Some(CatalogStatistics(sizeInBytes = 
BigInt(sizeInBytes
+} else if (partitionsWithSize.count(_._2 == 0) <= 
conf.maxPartNumForStatsCalculateViaFS) {
 
 Review comment:
   @fuwhu, are you're proposing a configuration to automatically calculate the 
size? why don't you just manually run analyze comment to calculate the stats? 
It's weird to do this based on the number of partitions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support 
ANSI nested bracketed comments
URL: https://github.com/apache/spark/pull/27495#issuecomment-587306310
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118606/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support 
ANSI nested bracketed comments
URL: https://github.com/apache/spark/pull/27495#issuecomment-587306306
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] 
Revise boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587306316
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23370/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] 
Revise boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587306311
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested 
bracketed comments
URL: https://github.com/apache/spark/pull/27495#issuecomment-587306310
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118606/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested 
bracketed comments
URL: https://github.com/apache/spark/pull/27495#issuecomment-587306306
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise 
boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587306316
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23370/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise 
boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587306311
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments

2020-02-17 Thread GitBox

SparkQA removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI 
nested bracketed comments
URL: https://github.com/apache/spark/pull/27495#issuecomment-587239864
 
 
   **[Test build #118606 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118606/testReport)**
 for PR 27495 at commit 
[`25d0863`](https://github.com/apache/spark/commit/25d0863015e881819c67fdeb2e85c47dfb08f098).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean 
config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587306011
 
 
   **[Test build #118617 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118617/testReport)**
 for PR 27563 at commit 
[`83fda3c`](https://github.com/apache/spark/commit/83fda3c90dd7cea7db4353be881965c8aa9e12ac).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments

2020-02-17 Thread GitBox

SparkQA commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested 
bracketed comments
URL: https://github.com/apache/spark/pull/27495#issuecomment-587305720
 
 
   **[Test build #118606 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118606/testReport)**
 for PR 27495 at commit 
[`25d0863`](https://github.com/apache/spark/commit/25d0863015e881819c67fdeb2e85c47dfb08f098).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

cloud-fan commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean 
config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587305011
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27617: [SPARK-30865][SQL] Refactor 
DateTimeUtils
URL: https://github.com/apache/spark/pull/27617#issuecomment-587304335
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23368/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27592: [SPARK-30840][CORE][SQL] Add 
version property for ConfigEntry and ConfigBuilder
URL: https://github.com/apache/spark/pull/27592#issuecomment-587304339
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23369/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27592: [SPARK-30840][CORE][SQL] Add 
version property for ConfigEntry and ConfigBuilder
URL: https://github.com/apache/spark/pull/27592#issuecomment-587304332
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27617: [SPARK-30865][SQL] Refactor 
DateTimeUtils
URL: https://github.com/apache/spark/pull/27617#issuecomment-587304327
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27617: [SPARK-30865][SQL] Refactor 
DateTimeUtils
URL: https://github.com/apache/spark/pull/27617#issuecomment-587304327
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27617: [SPARK-30865][SQL] Refactor 
DateTimeUtils
URL: https://github.com/apache/spark/pull/27617#issuecomment-587304335
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23368/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27592: [SPARK-30840][CORE][SQL] Add version 
property for ConfigEntry and ConfigBuilder
URL: https://github.com/apache/spark/pull/27592#issuecomment-587304339
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23369/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27592: [SPARK-30840][CORE][SQL] Add version 
property for ConfigEntry and ConfigBuilder
URL: https://github.com/apache/spark/pull/27592#issuecomment-587304332
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder

2020-02-17 Thread GitBox

SparkQA commented on issue #27592: [SPARK-30840][CORE][SQL] Add version 
property for ConfigEntry and ConfigBuilder
URL: https://github.com/apache/spark/pull/27592#issuecomment-587303954
 
 
   **[Test build #118616 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118616/testReport)**
 for PR 27592 at commit 
[`6d8eb75`](https://github.com/apache/spark/commit/6d8eb75f0c29962962f994d8f212fafae8577cfc).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils

2020-02-17 Thread GitBox

SparkQA commented on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils
URL: https://github.com/apache/spark/pull/27617#issuecomment-587303945
 
 
   **[Test build #118615 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118615/testReport)**
 for PR 27617 at commit 
[`0b5711e`](https://github.com/apache/spark/commit/0b5711e0e332817f6cff28f79ccffeaca304).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cozos commented on issue #25899: [SPARK-29089][SQL] Parallelize blocking FileSystem calls in DataSource#checkAndGlobPathIfNecessary

2020-02-17 Thread GitBox

cozos commented on issue #25899: [SPARK-29089][SQL] Parallelize blocking 
FileSystem calls in DataSource#checkAndGlobPathIfNecessary
URL: https://github.com/apache/spark/pull/25899#issuecomment-587303126
 
 
   Thank you everybody!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments

2020-02-17 Thread GitBox

beliefer commented on a change in pull request #27495: [SPARK-28880][SQL] 
Support ANSI nested bracketed comments
URL: https://github.com/apache/spark/pull/27495#discussion_r380478627
 
 

 ##
 File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##
 @@ -1797,11 +1797,11 @@ SIMPLE_COMMENT
 ;
 
 BRACKETED_EMPTY_COMMENT
-: '/**/' -> channel(HIDDEN)
+: '/*' BRACKETED_EMPTY_COMMENT? '*/' -> channel(HIDDEN)
 ;
 
 BRACKETED_COMMENT
 
 Review comment:
   If this problem could be solved in g4, things would be even simpler. Let me 
do some hard work and try.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] 
Revise boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587302311
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118614/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk opened a new pull request #27617: [SPARK-30865][SQL] Refactor DateTimeUtils

2020-02-17 Thread GitBox

MaxGekk opened a new pull request #27617: [SPARK-30865][SQL] Refactor 
DateTimeUtils
URL: https://github.com/apache/spark/pull/27617
 
 
   ### What changes were proposed in this pull request?
   
   1. Move TimeZoneUTC and TimeZoneGMT to DateTimeTestUtils
   2. Remove TimeZoneGMT
   3. Use ZoneId.systemDefault() instead of defaultTimeZone().toZoneId
   4. Alias SQLDate & SQLTimestamp to internal types of DateType and 
TimestampType
   
   ### Why are the changes needed?
   1. TimeZoneUTC and TimeZoneGMT are moved to DateTimeTestUtils because they 
are used only in tests
   2. TimeZoneGMT can be removed because it is equal to TimeZoneUTC
   3. After the PR #27494, Spark expressions and DateTimeUtils functions 
switched to ZoneId instead of TimeZone completely. `defaultTimeZone()` with 
`TimeZone` as return type is not needed anymore.
   4. SQLDate and SQLTimestamp types can be explicitly aliased to internal 
types of DateType and and TimestampType instead of declaring this in a comment.
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   By existing test suites
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] DavidToneian commented on issue #27613: [SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation.

2020-02-17 Thread GitBox

DavidToneian commented on issue #27613: [SPARK-30859][PYSPARK][DOCS][MINOR] 
Fixed docstring syntax issues preventing proper compilation of documentation.
URL: https://github.com/apache/spark/pull/27613#issuecomment-587302624
 
 
   @HyukjinKwon: These are the only instances I found when I searched for a 
space followed by a colon (" :") in the output HTML docs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] 
Revise boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587302308
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise 
boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587302311
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118614/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

SparkQA removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise 
boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587298211
 
 
   **[Test build #118614 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118614/testReport)**
 for PR 27563 at commit 
[`83fda3c`](https://github.com/apache/spark/commit/83fda3c90dd7cea7db4353be881965c8aa9e12ac).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean 
config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587302299
 
 
   **[Test build #118614 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118614/testReport)**
 for PR 27563 at commit 
[`83fda3c`](https://github.com/apache/spark/commit/83fda3c90dd7cea7db4353be881965c8aa9e12ac).
* This patch **fails build dependency tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise 
boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587302308
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder

2020-02-17 Thread GitBox

beliefer commented on a change in pull request #27592: [SPARK-30840][CORE][SQL] 
Add version property for ConfigEntry and ConfigBuilder
URL: https://github.com/apache/spark/pull/27592#discussion_r380477033
 
 

 ##
 File path: sql/gen-sql-config-docs.py
 ##
 @@ -49,12 +50,13 @@ def generate_sql_configs_table(sql_configs, path):
 
 ```html
 
-Property NameDefaultMeaning
+Property 
NameDefaultMeaningVersion
 
 Review comment:
   OK


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #27601: [SPARK-30847][SQL] Take productPrefix into account in MurmurHash3.productHash

2020-02-17 Thread GitBox

cloud-fan commented on issue #27601: [SPARK-30847][SQL] Take productPrefix into 
account in MurmurHash3.productHash
URL: https://github.com/apache/spark/pull/27601#issuecomment-587300412
 
 
   thanks, merging to master/3.0


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #27601: [SPARK-30847][SQL] Take productPrefix into account in MurmurHash3.productHash

2020-02-17 Thread GitBox

cloud-fan closed pull request #27601: [SPARK-30847][SQL] Take productPrefix 
into account in MurmurHash3.productHash
URL: https://github.com/apache/spark/pull/27601
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the 
user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587298472
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the 
user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587298476
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118613/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide 
for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587298472
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide 
for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587298476
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118613/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for 
Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587298385
 
 
   **[Test build #118613 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118613/testReport)**
 for PR 27616 at commit 
[`19a381b`](https://github.com/apache/spark/commit/19a381b2d5af5128e821d233f5c997730a6d8c36).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

SparkQA removed a comment on issue #27616: [SPARK-30864] [SQL]add the user 
guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587294511
 
 
   **[Test build #118613 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118613/testReport)**
 for PR 27616 at commit 
[`19a381b`](https://github.com/apache/spark/commit/19a381b2d5af5128e821d233f5c997730a6d8c36).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean 
config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587298211
 
 
   **[Test build #118614 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118614/testReport)**
 for PR 27563 at commit 
[`83fda3c`](https://github.com/apache/spark/commit/83fda3c90dd7cea7db4353be881965c8aa9e12ac).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add 
the user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#discussion_r380471798
 
 

 ##
 File path: docs/sql-performance-tuning.md
 ##
 @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names 
and a partition number is
 SELECT /*+ REPARTITION(3, c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
+
+## Adaptive Query Execution
+Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that 
make use of the runtime statistics to choose the most efficient query execution 
plan. AQE is disabled by default. Spark SQL can use the umbrella configuration 
of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are 
three mainly feature in AQE, including coalescing post partition number, 
optimizing local shuffle reader and optimizing skewed join.
+ ### Coalescing Post Shuffle Partition Num
+ This feature coalesces the post shuffle partitions based on the map output 
statistics when `spark.sql.adaptive.enabled` and 
`spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled` configuration 
properties are both enabled. There are four following sub-configurations in 
this optimization rule. 
+ 
+   Property NameDefaultMeaning
+   
+ 
spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled
+ true
+ 
+   When true and spark.sql.adaptive.enabled is enabled, spark 
will reduce the post shuffle partitions number based on the map output 
statistics.
+ 
+   
+   
+ 
spark.sql.adaptive.shuffle.minNumPostShufflePartitions
+ 1
+ 
+   The advisory minimum number of post-shuffle partitions used when 
spark.sql.adaptive.enabled and 
spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled are 
both enabled. It is suggested to be almost 2~3x of the parallelism when doing 
benchmark.
+ 
+   
+   
+ 
spark.sql.adaptive.shuffle.maxNumPostShufflePartitions
+ Int.MaxValue
+ 
+   The advisory maximum number of post-shuffle partitions used in adaptive 
execution. This is used as the initial number of pre-shuffle partitions. By 
default it equals to spark.sql.shuffle.partitions.
+ 
+   
+   
+ 
spark.sql.adaptive.shuffle.targetPostShuffleInputSize
+ 67108864 (64 MB)
+ 
+   The target post-shuffle input size in bytes of a task when 
spark.sql.adaptive.enabled and 
spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled are 
both enabled.
+ 
+   
+ 
+ 
+ ### Optimize Local Shuffle Reader
+ This feature optimize the shuffle reader to local shuffle reader when 
converting the sort merge join to broadcast hash join in runtime and no 
additional shuffle introduced. It takes effect when 
`spark.sql.adaptive.enabled` and 
`spark.sql.adaptive.shuffle.localShuffleReader.enabled` configuration 
properties are both enabled.
 
 Review comment:
   ditto, users care more about the benefit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise 
boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587296563
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23367/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] 
Revise boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587296551
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] 
Revise boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587296563
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23367/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise 
boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#issuecomment-587296551
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #27555: [SPARK-30802][ML] Use Summarizer instead of MultivariateOnlineSummarizer in Aggregator test suite

2020-02-17 Thread GitBox

zhengruifeng commented on a change in pull request #27555: [SPARK-30802][ML] 
Use Summarizer instead of MultivariateOnlineSummarizer in Aggregator test suite
URL: https://github.com/apache/spark/pull/27555#discussion_r380469360
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/stat/MultiClassSummarizer.scala
 ##
 @@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.stat
+
+import scala.collection.mutable
+
+
+/**
+ * MultiClassSummarizer computes the number of distinct labels and 
corresponding counts,
+ * and validates the data to see if the labels used for k class multi-label 
classification
+ * are in the range of {0, 1, ..., k - 1} in an online fashion.
+ *
+ * Two MultilabelSummarizer can be merged together to have a statistical 
summary of the
+ * corresponding joint dataset.
+ */
+private[ml] class MultiClassSummarizer extends Serializable {
+  // The first element of value in distinctMap is the actually number of 
instances,
+  // and the second element of value is sum of the weights.
+  private val distinctMap = new mutable.HashMap[Int, (Long, Double)]
+  private var totalInvalidCnt: Long = 0L
+
+  /**
+   * Add a new label into this MultilabelSummarizer, and update the distinct 
map.
+   *
+   * @param label The label for this data point.
+   * @param weight The weight of this instances.
+   * @return This MultilabelSummarizer
+   */
+  def add(label: Double, weight: Double = 1.0): MultiClassSummarizer = {
+require(weight >= 0.0, s"instance weight, $weight has to be >= 0.0")
+
+if (weight == 0.0) return this
+
+if (label - label.toInt != 0.0 || label < 0) {
+  totalInvalidCnt += 1
+  this
+}
+else {
+  val (counts: Long, weightSum: Double) = 
distinctMap.getOrElse(label.toInt, (0L, 0.0))
+  distinctMap.put(label.toInt, (counts + 1L, weightSum + weight))
+  this
+}
+  }
+
+  /**
+   * Merge another MultilabelSummarizer, and update the distinct map.
+   * (Note that it will merge the smaller distinct map into the larger one 
using in-place
+   * merging, so either `this` or `other` object will be modified and 
returned.)
+   *
+   * @param other The other MultilabelSummarizer to be merged.
+   * @return Merged MultilabelSummarizer object.
+   */
+  def merge(other: MultiClassSummarizer): MultiClassSummarizer = {
+val (largeMap, smallMap) = if (this.distinctMap.size > 
other.distinctMap.size) {
+  (this, other)
+} else {
+  (other, this)
+}
+smallMap.distinctMap.foreach {
+  case (key, value) =>
+val (counts: Long, weightSum: Double) = 
largeMap.distinctMap.getOrElse(key, (0L, 0.0))
+largeMap.distinctMap.put(key, (counts + value._1, weightSum + 
value._2))
+}
+largeMap.totalInvalidCnt += smallMap.totalInvalidCnt
+largeMap
+  }
+
+  /** @return The total invalid input counts. */
+  def countInvalid: Long = totalInvalidCnt
+
+  /** @return The number of distinct labels in the input dataset. */
+  def numClasses: Int = if (distinctMap.isEmpty) 0 else distinctMap.keySet.max 
+ 1
 
 Review comment:
   nit: `distinctMap.keysIterator.max + 1`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add 
the user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#discussion_r380470558
 
 

 ##
 File path: docs/sql-performance-tuning.md
 ##
 @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names 
and a partition number is
 SELECT /*+ REPARTITION(3, c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
+
+## Adaptive Query Execution
+Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that 
make use of the runtime statistics to choose the most efficient query execution 
plan. AQE is disabled by default. Spark SQL can use the umbrella configuration 
of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are 
three mainly feature in AQE, including coalescing post partition number, 
optimizing local shuffle reader and optimizing skewed join.
+ ### Coalescing Post Shuffle Partition Num
+ This feature coalesces the post shuffle partitions based on the map output 
statistics when `spark.sql.adaptive.enabled` and 
`spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled` configuration 
properties are both enabled. There are four following sub-configurations in 
this optimization rule. 
 
 Review comment:
   shall we introduce the benefits of this feature?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy

2020-02-17 Thread GitBox

Ngone51 commented on a change in pull request #27563: [SPARK-30812][SQL][CORE] 
Revise boolean config name to comply with new config naming policy
URL: https://github.com/apache/spark/pull/27563#discussion_r380470606
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##
 @@ -1629,7 +1629,7 @@ object SQLConf {
   .createWithDefault(true)
 
   val PANDAS_ARROW_SAFE_TYPE_CONVERSION =
-buildConf("spark.sql.execution.pandas.arrowSafeTypeConversion")
+buildConf("spark.sql.execution.pandas.arrowSafeTypeConversion.enabled")
 
 Review comment:
   updated with `convertToArrowArraySafely`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #27555: [SPARK-30802][ML] Use Summarizer instead of MultivariateOnlineSummarizer in Aggregator test suite

2020-02-17 Thread GitBox

zhengruifeng commented on a change in pull request #27555: [SPARK-30802][ML] 
Use Summarizer instead of MultivariateOnlineSummarizer in Aggregator test suite
URL: https://github.com/apache/spark/pull/27555#discussion_r380468433
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/stat/MultiClassSummarizer.scala
 ##
 @@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.stat
+
+import scala.collection.mutable
+
+
+/**
+ * MultiClassSummarizer computes the number of distinct labels and 
corresponding counts,
+ * and validates the data to see if the labels used for k class multi-label 
classification
+ * are in the range of {0, 1, ..., k - 1} in an online fashion.
+ *
+ * Two MultilabelSummarizer can be merged together to have a statistical 
summary of the
+ * corresponding joint dataset.
+ */
+private[ml] class MultiClassSummarizer extends Serializable {
+  // The first element of value in distinctMap is the actually number of 
instances,
+  // and the second element of value is sum of the weights.
+  private val distinctMap = new mutable.HashMap[Int, (Long, Double)]
+  private var totalInvalidCnt: Long = 0L
+
+  /**
+   * Add a new label into this MultilabelSummarizer, and update the distinct 
map.
+   *
+   * @param label The label for this data point.
+   * @param weight The weight of this instances.
+   * @return This MultilabelSummarizer
+   */
+  def add(label: Double, weight: Double = 1.0): MultiClassSummarizer = {
+require(weight >= 0.0, s"instance weight, $weight has to be >= 0.0")
+
+if (weight == 0.0) return this
+
+if (label - label.toInt != 0.0 || label < 0) {
 
 Review comment:
   ```scala
   if (label - label.toInt != 0.0 || label < 0) {
 totalInvalidCnt += 1
   } else {
 val (counts: Long, weightSum: Double) = 
distinctMap.getOrElse(label.toInt, (0L, 0.0))
 distinctMap.put(label.toInt, (counts + 1L, weightSum + weight))
   }
   this
   
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add 
the user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#discussion_r380469954
 
 

 ##
 File path: docs/sql-performance-tuning.md
 ##
 @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names 
and a partition number is
 SELECT /*+ REPARTITION(3, c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
+
+## Adaptive Query Execution
+Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that 
make use of the runtime statistics to choose the most efficient query execution 
plan. AQE is disabled by default. Spark SQL can use the umbrella configuration 
of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are 
three mainly feature in AQE, including coalescing post partition number, 
optimizing local shuffle reader and optimizing skewed join.
+ ### Coalescing Post Shuffle Partition Num
 
 Review comment:
   Number


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add 
the user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#discussion_r380469743
 
 

 ##
 File path: docs/sql-performance-tuning.md
 ##
 @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names 
and a partition number is
 SELECT /*+ REPARTITION(3, c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
+
+## Adaptive Query Execution
+Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that 
make use of the runtime statistics to choose the most efficient query execution 
plan. AQE is disabled by default. Spark SQL can use the umbrella configuration 
of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are 
three mainly feature in AQE, including coalescing post partition number, 
optimizing local shuffle reader and optimizing skewed join.
 
 Review comment:
   post-shuffle partition number


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add 
the user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#discussion_r380469743
 
 

 ##
 File path: docs/sql-performance-tuning.md
 ##
 @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names 
and a partition number is
 SELECT /*+ REPARTITION(3, c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
+
+## Adaptive Query Execution
+Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that 
make use of the runtime statistics to choose the most efficient query execution 
plan. AQE is disabled by default. Spark SQL can use the umbrella configuration 
of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are 
three mainly feature in AQE, including coalescing post partition number, 
optimizing local shuffle reader and optimizing skewed join.
 
 Review comment:
   post-shuffle partitions number


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add 
the user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#discussion_r380469633
 
 

 ##
 File path: docs/sql-performance-tuning.md
 ##
 @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names 
and a partition number is
 SELECT /*+ REPARTITION(3, c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
+
+## Adaptive Query Execution
+Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that 
make use of the runtime statistics to choose the most efficient query execution 
plan. AQE is disabled by default. Spark SQL can use the umbrella configuration 
of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are 
three mainly feature in AQE, including coalescing post partition number, 
optimizing local shuffle reader and optimizing skewed join.
 
 Review comment:
   `As of Spark 3.0, there are three major features ...`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add 
the user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#discussion_r380469476
 
 

 ##
 File path: docs/sql-performance-tuning.md
 ##
 @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names 
and a partition number is
 SELECT /*+ REPARTITION(3, c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
 SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
+
+## Adaptive Query Execution
+Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that 
make use of the runtime statistics to choose the most efficient query execution 
plan. AQE is disabled by default. Spark SQL can use the umbrella configuration 
of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are 
three mainly feature in AQE, including coalescing post partition number, 
optimizing local shuffle reader and optimizing skewed join.
 
 Review comment:
   makes use


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments

2020-02-17 Thread GitBox

cloud-fan commented on a change in pull request #27495: [SPARK-28880][SQL] 
Support ANSI nested bracketed comments
URL: https://github.com/apache/spark/pull/27495#discussion_r380469230
 
 

 ##
 File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##
 @@ -1797,11 +1797,11 @@ SIMPLE_COMMENT
 ;
 
 BRACKETED_EMPTY_COMMENT
-: '/**/' -> channel(HIDDEN)
+: '/*' BRACKETED_EMPTY_COMMENT? '*/' -> channel(HIDDEN)
 ;
 
 BRACKETED_COMMENT
 
 Review comment:
   > we still need to distinguish hint and comment syntax
   
   We can distinguish them better if they are both parser rules.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the 
user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587294916
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23366/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the 
user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587294911
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide 
for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587294911
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide 
for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587294916
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23366/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for 
Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587294511
 
 
   **[Test build #118613 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118613/testReport)**
 for PR 27616 at commit 
[`19a381b`](https://github.com/apache/spark/commit/19a381b2d5af5128e821d233f5c997730a6d8c36).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JkSelf commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

JkSelf commented on issue #27616: [SPARK-30864] [SQL]add the user guide for 
Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#issuecomment-587293946
 
 
   cc @cloud-fan 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JkSelf opened a new pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

2020-02-17 Thread GitBox

JkSelf opened a new pull request #27616: [SPARK-30864] [SQL]add the user guide 
for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616
 
 
   
   
   ### What changes were proposed in this pull request?
   This PR will add the user guide for AQE and the detailed configurations 
about the three mainly features in AQE.
   
   
   
   ### Why are the changes needed?
   Add the detailed configurations.
   
   
   
   ### Does this PR introduce any user-facing change?
   No
   
   
   
   ### How was this patch tested?
   only add doc no need ut.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1138 matches

Mail list logo