[GitHub] [spark] SparkQA commented on pull request #32533: [SPARK-35392][ML][PYTHON] Remove Flaky GMM Test in ml/clustering.py

2021-05-12 Thread GitBox


SparkQA commented on pull request #32533:
URL: https://github.com/apache/spark/pull/32533#issuecomment-840356642


   **[Test build #138498 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138498/testReport)**
 for PR 32533 at commit 
[`0081246`](https://github.com/apache/spark/commit/008124671ed27cb4367c941a1f8b73cda76e13b0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32533: [SPARK-35392][ML][PYTHON] Remove Flaky GMM Test in ml/clustering.py

2021-05-12 Thread GitBox


SparkQA removed a comment on pull request #32533:
URL: https://github.com/apache/spark/pull/32533#issuecomment-840339950


   **[Test build #138498 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138498/testReport)**
 for PR 32533 at commit 
[`0081246`](https://github.com/apache/spark/commit/008124671ed27cb4367c941a1f8b73cda76e13b0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-12 Thread GitBox


SparkQA removed a comment on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840242617


   **[Test build #138482 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138482/testReport)**
 for PR 32515 at commit 
[`a79d76e`](https://github.com/apache/spark/commit/a79d76eda4e4fe262a57a32b6aa16079aead7b34).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-12 Thread GitBox


SparkQA commented on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840355450


   **[Test build #138482 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138482/testReport)**
 for PR 32515 at commit 
[`a79d76e`](https://github.com/apache/spark/commit/a79d76eda4e4fe262a57a32b6aa16079aead7b34).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE

2021-05-12 Thread GitBox


SparkQA removed a comment on pull request #32292:
URL: https://github.com/apache/spark/pull/32292#issuecomment-840286781


   **[Test build #138490 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138490/testReport)**
 for PR 32292 at commit 
[`774bda1`](https://github.com/apache/spark/commit/774bda13487ab0823e20d0295c6e7108a5a62b83).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32292:
URL: https://github.com/apache/spark/pull/32292#issuecomment-840347165


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138490/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32292:
URL: https://github.com/apache/spark/pull/32292#issuecomment-840347165


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138490/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE

2021-05-12 Thread GitBox


SparkQA commented on pull request #32292:
URL: https://github.com/apache/spark/pull/32292#issuecomment-840346863


   **[Test build #138490 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138490/testReport)**
 for PR 32292 at commit 
[`774bda1`](https://github.com/apache/spark/commit/774bda13487ab0823e20d0295c6e7108a5a62b83).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class TryEval(child: Expression) extends UnaryExpression with 
NullIntolerant `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-840344422


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43017/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-12 Thread GitBox


SparkQA commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-840344392


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43017/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-840344422


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43017/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join

2021-05-12 Thread GitBox


cloud-fan commented on a change in pull request #32528:
URL: https://github.com/apache/spark/pull/32528#discussion_r631598224



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
##
@@ -424,8 +424,18 @@ case class SortMergeJoinExec(
 // A list to hold all matched rows from buffered side.
 val clsName = classOf[ExternalAppendOnlyUnsafeRowArray].getName
 
+// Flag to only buffer first matched row, to avoid buffering unnecessary 
rows.
+val onlyBufferFirstMatchedRow = (joinType, condition) match {
+  case (LeftSemi, None) => true
+  case _ => false
+}
+val inMemoryThreshold =
+  if (onlyBufferFirstMatchedRow) {

Review comment:
   +1, `lazy val` can probably be `def` as the logic is super simple




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32410:
URL: https://github.com/apache/spark/pull/32410#issuecomment-840343604


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43016/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32410:
URL: https://github.com/apache/spark/pull/32410#issuecomment-840343604


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43016/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-05-12 Thread GitBox


SparkQA commented on pull request #32410:
URL: https://github.com/apache/spark/pull/32410#issuecomment-840343550






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32501: [SPARK-35359][SQL] Insert data with char/varchar datatype will fail when data length exceed length limitation

2021-05-12 Thread GitBox


cloud-fan commented on a change in pull request #32501:
URL: https://github.com/apache/spark/pull/32501#discussion_r631597395



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/util/CharVarcharCodegenUtils.java
##
@@ -26,7 +27,7 @@ private static UTF8String trimTrailingSpaces(
   UTF8String inputStr, int numChars, int limit) {
 int numTailSpacesToTrim = numChars - limit;
 UTF8String trimmed = inputStr.trimTrailingSpaces(numTailSpacesToTrim);
-if (trimmed.numChars() > limit) {
+if (trimmed.numChars() > limit && !SQLConf.get().charVarcharAsString()) {

Review comment:
   We don't need this now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #32207: [SPARK-35106] Avoid failing rename in HadoopMapReduceCommitProtocol with dynamic partition overwrite

2021-05-12 Thread GitBox


cloud-fan commented on pull request #32207:
URL: https://github.com/apache/spark/pull/32207#issuecomment-840342929


   @YuzhouSun Can you help to take over this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-840341104


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43015/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-12 Thread GitBox


SparkQA commented on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-840341058






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-840341104


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43015/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


cloud-fan commented on a change in pull request #32527:
URL: https://github.com/apache/spark/pull/32527#discussion_r631596017



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
##
@@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression 
{
   arguments: Seq[Expression],
   input: InternalRow,
   dataType: DataType): Any = {
-val args = arguments.map(e => e.eval(input).asInstanceOf[Object])
-if (needNullCheck && args.exists(_ == null)) {
+var i = 0
+val len = arguments.length
+while (i < len) {
+  evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object]
+  i += 1
+}
+if (needNullCheck && evaluatedArgs.contains(null)) {
   // return null if one of arguments is null
   null
 } else {
   val ret = try {
-method.invoke(obj, args: _*)
+method.invoke(obj, evaluatedArgs: _*)
   } catch {

Review comment:
   You are right. Another idea: `obj` from `InternalRow` are always of the 
same class, we can avoid this
   ```
   @transient lazy val method = {
 val cls = targetObject.dataType match {
   case ObjectType(cls) => cls
   case StringType => classOf[UTF8String]
   case _: DecimalType => classOf[Decimal]
   ...
 }
 findMethod(cls, encodedFunctionName, argClasses)
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32532: [SPARK-35384][SQL][FOLLOWUP] Move `HashMap.get` out of `InvokeLike.invoke`

2021-05-12 Thread GitBox


SparkQA commented on pull request #32532:
URL: https://github.com/apache/spark/pull/32532#issuecomment-840340027


   **[Test build #138499 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138499/testReport)**
 for PR 32532 at commit 
[`8a97c30`](https://github.com/apache/spark/commit/8a97c304f8656e337f98948db3454b2dfd802414).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32533: [SPARK-35392][ML][PYTHON] Remove Flaky GMM Test in ml/clustering.py

2021-05-12 Thread GitBox


SparkQA commented on pull request #32533:
URL: https://github.com/apache/spark/pull/32533#issuecomment-840339950


   **[Test build #138498 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138498/testReport)**
 for PR 32533 at commit 
[`0081246`](https://github.com/apache/spark/commit/008124671ed27cb4367c941a1f8b73cda76e13b0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-840339404


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43012/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32448: [SPARK-35290][SQL] Use StructType merging for unionByName with null filling

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32448:
URL: https://github.com/apache/spark/pull/32448#issuecomment-840339407


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138481/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840339408


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43013/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32527:
URL: https://github.com/apache/spark/pull/32527#issuecomment-840339402


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138480/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-840339405


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43014/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32448: [SPARK-35290][SQL] Use StructType merging for unionByName with null filling

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32448:
URL: https://github.com/apache/spark/pull/32448#issuecomment-840339407


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138481/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840339408


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43013/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32527:
URL: https://github.com/apache/spark/pull/32527#issuecomment-840339402


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138480/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-840339404


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43012/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-840339405


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43014/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-12 Thread GitBox


SparkQA commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-840339374


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43017/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-12 Thread GitBox


SparkQA commented on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-840337565






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-12 Thread GitBox


SparkQA commented on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840334344


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43013/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #32533: [SPARK-35392][ML][PYTHON] remove Flaky GMM Test in ml/clustering.py

2021-05-12 Thread GitBox


zhengruifeng commented on pull request #32533:
URL: https://github.com/apache/spark/pull/32533#issuecomment-84040


   ping @HyukjinKwon @srowen @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request #32533: [SPARK-35392][ML][PYTHON] remove Flaky GMM Test in ml/clustering.py

2021-05-12 Thread GitBox


zhengruifeng opened a new pull request #32533:
URL: https://github.com/apache/spark/pull/32533


   ### What changes were proposed in this pull request?
   remove the check of `summary.logLikelihood` in  ml/clustering.py 
   
   
   ### Why are the changes needed?
   1, this GMM test is quite Flaky, it tend to fail if:
   - change number of partitions;
   - just change the way to compute the sum of weights;
   - change the underlying BLAS impl
   
   2, for now, just disable it, we need to use another test in the future.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   remaining testsuites
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] WangGuangxin commented on pull request #31967: [SPARK-34819][SQL] MapType supports orderable semantics

2021-05-12 Thread GitBox


WangGuangxin commented on pull request #31967:
URL: https://github.com/apache/spark/pull/31967#issuecomment-840331912


   > @WangGuangxin If you cannot keep working on it, is it okay that I take 
this over?
   
   Sure, I'm stuck with something else, you can take this over if you have 
time. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao opened a new pull request #32532: [SPARK-35384][SQL][FOLLOWUP] Move `HashMap.get` out of `InvokeLike.invoke`

2021-05-12 Thread GitBox


sunchao opened a new pull request #32532:
URL: https://github.com/apache/spark/pull/32532


   
   
   ### What changes were proposed in this pull request?
   
   
   Move hash map lookup operation out of `InvokeLike.invoke` since it doesn't 
depend on the input.
   
   ### Why are the changes needed?
   
   
   We shouldn't need to look up the hash map for every input row evaluated by 
`InvokeLike.invoke` since it doesn't depend on input. This could speed up the 
performance a bit.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cfmcgrady closed pull request #32488: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In/InSet predicate

2021-05-12 Thread GitBox


cfmcgrady closed pull request #32488:
URL: https://github.com/apache/spark/pull/32488


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #32523: [SPARK-35382][PYTHON] Fix lambda variable name issues in nested DataFrame functions in Python APIs.

2021-05-12 Thread GitBox


HyukjinKwon closed pull request #32523:
URL: https://github.com/apache/spark/pull/32523


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32523: [SPARK-35382][PYTHON] Fix lambda variable name issues in nested DataFrame functions in Python APIs.

2021-05-12 Thread GitBox


HyukjinKwon commented on pull request #32523:
URL: https://github.com/apache/spark/pull/32523#issuecomment-840324993


   Merged to master and branch-3.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32448: [SPARK-35290][SQL] Use StructType merging for unionByName with null filling

2021-05-12 Thread GitBox


SparkQA removed a comment on pull request #32448:
URL: https://github.com/apache/spark/pull/32448#issuecomment-840218983


   **[Test build #138481 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138481/testReport)**
 for PR 32448 at commit 
[`93b47d3`](https://github.com/apache/spark/commit/93b47d3f190369afdf5a2a5ae0ec0c6054b56c1b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32448: [SPARK-35290][SQL] Use StructType merging for unionByName with null filling

2021-05-12 Thread GitBox


SparkQA commented on pull request #32448:
URL: https://github.com/apache/spark/pull/32448#issuecomment-840324232


   **[Test build #138481 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138481/testReport)**
 for PR 32448 at commit 
[`93b47d3`](https://github.com/apache/spark/commit/93b47d3f190369afdf5a2a5ae0ec0c6054b56c1b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


SparkQA removed a comment on pull request #32527:
URL: https://github.com/apache/spark/pull/32527#issuecomment-840217408


   **[Test build #138480 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138480/testReport)**
 for PR 32527 at commit 
[`2831f9c`](https://github.com/apache/spark/commit/2831f9c0b78aa21c6cc906370fb5069e166dbf39).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


SparkQA commented on pull request #32527:
URL: https://github.com/apache/spark/pull/32527#issuecomment-840322575


   **[Test build #138480 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138480/testReport)**
 for PR 32527 at commit 
[`2831f9c`](https://github.com/apache/spark/commit/2831f9c0b78aa21c6cc906370fb5069e166dbf39).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-12 Thread GitBox


SparkQA commented on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-840318050


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43012/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-12 Thread GitBox


SparkQA commented on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-840315107


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43012/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-12 Thread GitBox


HyukjinKwon edited a comment on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-840312271


   @itholic:
   
   1. Please check the option **one by one** and see if each exists, and is 
matched.
   2. Document general options in 
https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html if 
there are missing ones
   3. If you're going to do 2. separately in another PR and JIRA, don't remove 
general options in API documentations for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-12 Thread GitBox


HyukjinKwon edited a comment on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-840312271


   @itholic:
   
   1. Please check the option **one by one** and see if each exists.
   2. Document general options in 
https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html if 
there are missing ones
   3. If you're going to do 2. separately in another PR and JIRA, don't remove 
general options in API documentations for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32516:
URL: https://github.com/apache/spark/pull/32516#issuecomment-840312669


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43008/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32516:
URL: https://github.com/apache/spark/pull/32516#issuecomment-840312669


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43008/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes

2021-05-12 Thread GitBox


SparkQA commented on pull request #32516:
URL: https://github.com/apache/spark/pull/32516#issuecomment-840312637






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-12 Thread GitBox


HyukjinKwon commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-840312618


   Same comment goes here too: 
https://github.com/apache/spark/pull/32204#issuecomment-840312271


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32531:
URL: https://github.com/apache/spark/pull/32531#issuecomment-840312131


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43011/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


sunchao commented on a change in pull request #32527:
URL: https://github.com/apache/spark/pull/32527#discussion_r631576884



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
##
@@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression 
{
   arguments: Seq[Expression],
   input: InternalRow,
   dataType: DataType): Any = {
-val args = arguments.map(e => e.eval(input).asInstanceOf[Object])
-if (needNullCheck && args.exists(_ == null)) {
+var i = 0
+val len = arguments.length
+while (i < len) {
+  evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object]
+  i += 1
+}
+if (needNullCheck && evaluatedArgs.contains(null)) {
   // return null if one of arguments is null
   null
 } else {
   val ret = try {
-method.invoke(obj, args: _*)
+method.invoke(obj, evaluatedArgs: _*)
   } catch {

Review comment:
   I'm not sure if we can do the similar thing in `Invoke.eval` though 
since `obj` in `obj.getClass.getMethod(functionName, argClasses: _*)` is 
different for each call.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-12 Thread GitBox


HyukjinKwon commented on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-840312271


   @itholic:
   
   1. Please check the option **one by one** and see if each exists.
   2. Document general options in 
https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html if 
there are missing ones
   3. If you're going to do this separately in a separate JIRA, don't remove 
general options in API documentations for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32531:
URL: https://github.com/apache/spark/pull/32531#issuecomment-840312131


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43011/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file

2021-05-12 Thread GitBox


SparkQA commented on pull request #32531:
URL: https://github.com/apache/spark/pull/32531#issuecomment-840312101






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-12 Thread GitBox


HyukjinKwon commented on a change in pull request #32204:
URL: https://github.com/apache/spark/pull/32204#discussion_r631576139



##
File path: python/pyspark/sql/streaming.py
##
@@ -504,105 +504,15 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
 path : str
 string represents path to the JSON dataset,
 or RDD of Strings storing JSON objects.
-schema : :class:`pyspark.sql.types.StructType` or str, optional

Review comment:
   I don't think this is a general option




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-12 Thread GitBox


HyukjinKwon commented on a change in pull request #32204:
URL: https://github.com/apache/spark/pull/32204#discussion_r631575888



##
File path: python/pyspark/sql/readwriter.py
##
@@ -1196,39 +1097,13 @@ def json(self, path, mode=None, compression=None, 
dateFormat=None, timestampForm
 --
 path : str
 the path in any Hadoop supported file system
-mode : str, optional

Review comment:
   mode is a general option




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-840292938


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138477/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-12 Thread GitBox


SparkQA commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-840310729


   **[Test build #138497 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138497/testReport)**
 for PR 32161 at commit 
[`bb5cd45`](https://github.com/apache/spark/commit/bb5cd4529b07b05b21cdaf878b06b61ad717be79).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-05-12 Thread GitBox


SparkQA commented on pull request #32410:
URL: https://github.com/apache/spark/pull/32410#issuecomment-840310594


   **[Test build #138496 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138496/testReport)**
 for PR 32410 at commit 
[`4bca8ec`](https://github.com/apache/spark/commit/4bca8ecaec066ef19d04a12e134ba830320a2e0f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-12 Thread GitBox


SparkQA commented on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-840310493


   **[Test build #138495 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138495/testReport)**
 for PR 32494 at commit 
[`1573522`](https://github.com/apache/spark/commit/1573522541ceaf1e0b6e0eccb108b88f0fb1a4c6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-12 Thread GitBox


SparkQA commented on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-840310425


   **[Test build #138494 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138494/testReport)**
 for PR 32498 at commit 
[`b7a6cc7`](https://github.com/apache/spark/commit/b7a6cc71110fe8de45e8c74d487ebd23b7942f34).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-12 Thread GitBox


SparkQA commented on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840310366


   **[Test build #138493 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138493/testReport)**
 for PR 32515 at commit 
[`b8b54ea`](https://github.com/apache/spark/commit/b8b54ea9cb3bdbb8f50bdb260567dedd2af9fe1b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-12 Thread GitBox


HyukjinKwon commented on a change in pull request #32161:
URL: https://github.com/apache/spark/pull/32161#discussion_r631575367



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -812,46 +812,10 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   /**
* Loads a Parquet file, returning the result as a `DataFrame`.
*
-   * You can set the following Parquet-specific option(s) for reading Parquet 
files:
-   * 
-   * `mergeSchema` (default is the value specified in 
`spark.sql.parquet.mergeSchema`): sets
-   * whether we should merge schemas collected from all Parquet part-files. 
This will override
-   * `spark.sql.parquet.mergeSchema`.
-   * `pathGlobFilter`: an optional glob pattern to only include files with 
paths matching
-   * the pattern. The syntax follows 
org.apache.hadoop.fs.GlobFilter.
-   * It does not change the behavior of partition discovery.
-   * `modifiedBefore` (batch only): an optional timestamp to only include 
files with
-   * modification times  occurring before the specified Time. The provided 
timestamp
-   * must be in the following form: -MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)
-   * `modifiedAfter` (batch only): an optional timestamp to only include 
files with
-   * modification times occurring after the specified Time. The provided 
timestamp
-   * must be in the following form: -MM-DDTHH:mm:ss (e.g. 
2020-06-01T13:00:00)
-   * `recursiveFileLookup`: recursively scan a directory for files. Using 
this option
-   * disables partition discovery
-   * `datetimeRebaseMode` (default is the value specified in the SQL config
-   * `spark.sql.parquet.datetimeRebaseModeInRead`): the rebasing mode for the 
values
-   * of the `DATE`, `TIMESTAMP_MICROS`, `TIMESTAMP_MILLIS` logical types from 
the Julian to
-   * Proleptic Gregorian calendar:
-   *   
-   * `EXCEPTION` : Spark fails in reads of ancient dates/timestamps 
that are ambiguous
-   * between the two calendars
-   * `CORRECTED` : loading of dates/timestamps without rebasing
-   * `LEGACY` : perform rebasing of ancient dates/timestamps from the 
Julian to Proleptic
-   * Gregorian calendar
-   *   
-   * 
-   * `int96RebaseMode` (default is the value specified in the SQL config
-   * `spark.sql.parquet.int96RebaseModeInRead`): the rebasing mode for `INT96` 
timestamps
-   * from the Julian to Proleptic Gregorian calendar:
-   *   
-   * `EXCEPTION` : Spark fails in reads of ancient `INT96` timestamps 
that are ambiguous
-   * between the two calendars
-   * `CORRECTED` : loading of timestamps without rebasing
-   * `LEGACY` : perform rebasing of ancient `INT96` timestamps from 
the Julian to Proleptic
-   * Gregorian calendar
-   *   
-   * 
-   * 
+   * Parquet-specific option(s) for reading Parquet files can be found in
+   * https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option";>
+   *   Data Source Option in the version you use.

Review comment:
   can you add the general options here too




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32516:
URL: https://github.com/apache/spark/pull/32516#issuecomment-840309736


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138488/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32520:
URL: https://github.com/apache/spark/pull/32520#issuecomment-840309734


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138479/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32292:
URL: https://github.com/apache/spark/pull/32292#issuecomment-840309741


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43010/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-840309740


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138478/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-12 Thread GitBox


AmplabJenkins removed a comment on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840309738


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43009/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-840309740


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138478/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32292:
URL: https://github.com/apache/spark/pull/32292#issuecomment-840309741


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43010/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32516:
URL: https://github.com/apache/spark/pull/32516#issuecomment-840309736


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138488/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32520:
URL: https://github.com/apache/spark/pull/32520#issuecomment-840309734


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138479/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840309738


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43009/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shahidki31 commented on a change in pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-12 Thread GitBox


shahidki31 commented on a change in pull request #32494:
URL: https://github.com/apache/spark/pull/32494#discussion_r631574179



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/UnionEstimation.scala
##
@@ -111,6 +111,44 @@ object UnionEstimation {
   AttributeMap.empty[ColumnStat]
 }
 
+val attrToComputeNullCount = 
union.children.map(_.output).transpose.zipWithIndex.filter {
+  case (attrs, _) => attrs.zipWithIndex.forall {
+case (attr, childIndex) =>
+  val attrStats = union.children(childIndex).stats.attributeStats
+  attrStats.get(attr).isDefined && attrStats(attr).nullCount.isDefined
+  }
+}
+
+val newAttrStats = if (attrToComputeNullCount.nonEmpty) {
+  val outputAttrStats = new ArrayBuffer[(Attribute, ColumnStat)]()
+  attrToComputeNullCount.foreach {
+case (attrs, outputIndex) =>
+  val colWithNullStatValues = 
attrs.zipWithIndex.foldLeft[Option[BigInt]](None) {
+case (totalNullCount, (attr, childIndex)) =>
+  val colStat = 
union.children(childIndex).stats.attributeStats(attr)
+  if (totalNullCount.isDefined) {

Review comment:
   Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-12 Thread GitBox


SparkQA commented on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840308059


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43009/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-12 Thread GitBox


SparkQA commented on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840305304


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43009/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader

2021-05-12 Thread GitBox


HyukjinKwon commented on pull request #32515:
URL: https://github.com/apache/spark/pull/32515#issuecomment-840303599


   Looks okay to me too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE

2021-05-12 Thread GitBox


SparkQA commented on pull request #32292:
URL: https://github.com/apache/spark/pull/32292#issuecomment-840303409






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shahidki31 commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-12 Thread GitBox


shahidki31 commented on a change in pull request #32498:
URL: https://github.com/apache/spark/pull/32498#discussion_r631566208



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala
##
@@ -283,14 +326,17 @@ class BasicStatsEstimationSuite extends PlanTest with 
StatsEstimationTestBase {
   private def checkStats(
   plan: LogicalPlan,
   expectedStatsCboOn: Statistics,
-  expectedStatsCboOff: Statistics): Unit = {
-withSQLConf(SQLConf.CBO_ENABLED.key -> "true") {
+  expectedStatsCboOff: Statistics,
+  extraConfigs: Map[String, String] = Map.empty): Unit = {
+

Review comment:
   Yes, removed the extra line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


sunchao commented on a change in pull request #32527:
URL: https://github.com/apache/spark/pull/32527#discussion_r631565642



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
##
@@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression 
{
   arguments: Seq[Expression],
   input: InternalRow,
   dataType: DataType): Any = {
-val args = arguments.map(e => e.eval(input).asInstanceOf[Object])
-if (needNullCheck && args.exists(_ == null)) {
+var i = 0
+val len = arguments.length
+while (i < len) {
+  evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object]
+  i += 1
+}
+if (needNullCheck && evaluatedArgs.contains(null)) {
   // return null if one of arguments is null
   null
 } else {
   val ret = try {
-method.invoke(obj, args: _*)
+method.invoke(obj, evaluatedArgs: _*)
   } catch {

Review comment:
   Yea let me try it. In the profiling after this PR, `HashMap.get` takes 
7.82% from the entire `invoke` call so it seems worthwhile to do this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests

2021-05-12 Thread GitBox


SparkQA removed a comment on pull request #32520:
URL: https://github.com/apache/spark/pull/32520#issuecomment-840197479


   **[Test build #138479 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138479/testReport)**
 for PR 32520 at commit 
[`299abb5`](https://github.com/apache/spark/commit/299abb537bf715506d77079b65a4704a04a2829f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests

2021-05-12 Thread GitBox


SparkQA commented on pull request #32520:
URL: https://github.com/apache/spark/pull/32520#issuecomment-840300886


   **[Test build #138479 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138479/testReport)**
 for PR 32520 at commit 
[`299abb5`](https://github.com/apache/spark/commit/299abb537bf715506d77079b65a4704a04a2829f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shahidki31 commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-12 Thread GitBox


shahidki31 commented on a change in pull request #32498:
URL: https://github.com/apache/spark/pull/32498#discussion_r631565143



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala
##
@@ -283,14 +326,17 @@ class BasicStatsEstimationSuite extends PlanTest with 
StatsEstimationTestBase {
   private def checkStats(
   plan: LogicalPlan,
   expectedStatsCboOn: Statistics,
-  expectedStatsCboOff: Statistics): Unit = {
-withSQLConf(SQLConf.CBO_ENABLED.key -> "true") {
+  expectedStatsCboOff: Statistics,
+  extraConfigs: Map[String, String] = Map.empty): Unit = {
+

Review comment:
   I am not sure I understand you here. Do we need to directly put the 
histogram configs inside this method? By default histogram is disabled and 
number of bins default value is 254.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shahidki31 commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-12 Thread GitBox


shahidki31 commented on a change in pull request #32498:
URL: https://github.com/apache/spark/pull/32498#discussion_r631564790



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala
##
@@ -77,12 +92,21 @@ class BasicStatsEstimationSuite extends PlanTest with 
StatsEstimationTestBase {
 max = Some(4),
 nullCount = Some(0),
 maxLen = Some(LongType.defaultSize),
-avgLen = Some(LongType.defaultSize))
-checkStats(range, expectedStatsCboOn = rangeStats, expectedStatsCboOff = 
rangeStats)
+avgLen = Some(LongType.defaultSize),
+histogram = histogram)
+val extraConfig = Map(SQLConf.HISTOGRAM_ENABLED.key -> "true",
+  SQLConf.HISTOGRAM_NUM_BINS.key -> "3")
+checkStats(range, expectedStatsCboOn = rangeStats,
+  expectedStatsCboOff = rangeStats, extraConfig)
   }
 
   test("range with negative step") {
 val range = Range(-10, -20, -2, None)
+val histogramBins = new Array[HistogramBin](3)
+histogramBins(0) = HistogramBin(-18.0, -16.0, 2)
+histogramBins(1) = HistogramBin(-16.0, -12.0, 2)
+histogramBins(2) = HistogramBin(-12.0, -10.0, 1)

Review comment:
   Added assert to check if `range.numElements` and `ndv` are same

##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala
##
@@ -97,12 +121,24 @@ class BasicStatsEstimationSuite extends PlanTest with 
StatsEstimationTestBase {
 max = Some(-10),
 nullCount = Some(0),
 maxLen = Some(LongType.defaultSize),
-avgLen = Some(LongType.defaultSize))
-checkStats(range, expectedStatsCboOn = rangeStats, expectedStatsCboOff = 
rangeStats)
+avgLen = Some(LongType.defaultSize),
+histogram = histogram)
+val extraConfig = Map(SQLConf.HISTOGRAM_ENABLED.key -> "true",
+  SQLConf.HISTOGRAM_NUM_BINS.key -> "3")
+checkStats(range, expectedStatsCboOn = rangeStats,
+  expectedStatsCboOff = rangeStats, extraConfig)
   }
 
   test("range with negative step where end minus start not divisible by step") 
{
+
 val range = Range(-10, -20, -3, None)
+
+val histogramBins = new Array[HistogramBin](3)
+histogramBins(0) = HistogramBin(-19.0, -16.0, 2)
+histogramBins(1) = HistogramBin(-16.0, -13.0, 1)
+histogramBins(2) = HistogramBin(-13.0, -10.0, 1)

Review comment:
   Updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shahidki31 commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-12 Thread GitBox


shahidki31 commented on a change in pull request #32498:
URL: https://github.com/apache/spark/pull/32498#discussion_r631564612



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##
@@ -789,6 +797,38 @@ case class Range(
 }
   }
 
+  private def computeHistogramStatistics() = {
+val numBins = conf.histogramNumBins
+val height = numElements.toDouble / numBins
+val percentileArray = (0 to numBins).map(i => i * height).toArray
+
+val binArray = new Array[HistogramBin](numBins)
+var lowerIndex = percentileArray.head
+var lowerBinValue = getRangeValue(0)
+percentileArray.tail.zipWithIndex.foreach { case (upperIndex, binId) =>
+  // Integer index for upper and lower values in the bin.
+  val upperIndexPos = math.ceil(upperIndex).toInt - 1
+  val lowerIndexPos = math.ceil(lowerIndex).toInt - 1
+
+  val upperBinValue = getRangeValue(math.max(upperIndexPos, 0))
+  val ndv = math.max(upperIndexPos - lowerIndexPos, 1)
+  binArray(binId) = HistogramBin(lowerBinValue, upperBinValue, ndv)
+
+  lowerBinValue = upperBinValue
+  lowerIndex = upperIndex
+}
+Histogram(height, binArray)
+  }
+
+  // Utility method to compute histogram
+  private def getRangeValue(index: Int): Long = {

Review comment:
   Done

##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala
##
@@ -97,12 +121,24 @@ class BasicStatsEstimationSuite extends PlanTest with 
StatsEstimationTestBase {
 max = Some(-10),
 nullCount = Some(0),
 maxLen = Some(LongType.defaultSize),
-avgLen = Some(LongType.defaultSize))
-checkStats(range, expectedStatsCboOn = rangeStats, expectedStatsCboOff = 
rangeStats)
+avgLen = Some(LongType.defaultSize),
+histogram = histogram)
+val extraConfig = Map(SQLConf.HISTOGRAM_ENABLED.key -> "true",
+  SQLConf.HISTOGRAM_NUM_BINS.key -> "3")
+checkStats(range, expectedStatsCboOn = rangeStats,
+  expectedStatsCboOff = rangeStats, extraConfig)
   }
 
   test("range with negative step where end minus start not divisible by step") 
{
+

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shahidki31 commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-12 Thread GitBox


shahidki31 commented on a change in pull request #32498:
URL: https://github.com/apache/spark/pull/32498#discussion_r631564557



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##
@@ -789,6 +797,38 @@ case class Range(
 }
   }
 
+  private def computeHistogramStatistics() = {
+val numBins = conf.histogramNumBins
+val height = numElements.toDouble / numBins
+val percentileArray = (0 to numBins).map(i => i * height).toArray
+
+val binArray = new Array[HistogramBin](numBins)
+var lowerIndex = percentileArray.head
+var lowerBinValue = getRangeValue(0)

Review comment:
   Yes, updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes

2021-05-12 Thread GitBox


SparkQA removed a comment on pull request #32516:
URL: https://github.com/apache/spark/pull/32516#issuecomment-840286547


   **[Test build #138488 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138488/testReport)**
 for PR 32516 at commit 
[`702629c`](https://github.com/apache/spark/commit/702629ccead13baba006eab8a6340b49722bf60a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes

2021-05-12 Thread GitBox


SparkQA commented on pull request #32516:
URL: https://github.com/apache/spark/pull/32516#issuecomment-840298542


   **[Test build #138488 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138488/testReport)**
 for PR 32516 at commit 
[`702629c`](https://github.com/apache/spark/commit/702629ccead13baba006eab8a6340b49722bf60a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


cloud-fan commented on a change in pull request #32527:
URL: https://github.com/apache/spark/pull/32527#discussion_r631561074



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
##
@@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression 
{
   arguments: Seq[Expression],
   input: InternalRow,
   dataType: DataType): Any = {
-val args = arguments.map(e => e.eval(input).asInstanceOf[Object])
-if (needNullCheck && args.exists(_ == null)) {
+var i = 0
+val len = arguments.length
+while (i < len) {
+  evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object]
+  i += 1
+}
+if (needNullCheck && evaluatedArgs.contains(null)) {
   // return null if one of arguments is null
   null
 } else {
   val ret = try {
-method.invoke(obj, args: _*)
+method.invoke(obj, evaluatedArgs: _*)
   } catch {

Review comment:
   We can do the similar thing in `Invoke.eval`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


cloud-fan commented on a change in pull request #32527:
URL: https://github.com/apache/spark/pull/32527#discussion_r631560800



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
##
@@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression 
{
   arguments: Seq[Expression],
   input: InternalRow,
   dataType: DataType): Any = {
-val args = arguments.map(e => e.eval(input).asInstanceOf[Object])
-if (needNullCheck && args.exists(_ == null)) {
+var i = 0
+val len = arguments.length
+while (i < len) {
+  evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object]
+  i += 1
+}
+if (needNullCheck && evaluatedArgs.contains(null)) {
   // return null if one of arguments is null
   null
 } else {
   val ret = try {
-method.invoke(obj, args: _*)
+method.invoke(obj, evaluatedArgs: _*)
   } catch {

Review comment:
   Can we also improve the last piece?
   ```
 val boxedClass = ScalaReflection.typeBoxedJavaMapping.get(dataType)
 if (boxedClass.isDefined) {
   boxedClass.get.cast(ret)
 } else {
   ret
 }
   ```
   We can create a function for it
   ```
   private lazy val boxing: Any => Any = 
ScalaReflection.typeBoxedJavaMapping.get(dataType).map(_.cast(_)).getOrElse(identity)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-12 Thread GitBox


SparkQA removed a comment on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-840190295


   **[Test build #138478 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138478/testReport)**
 for PR 32494 at commit 
[`c929124`](https://github.com/apache/spark/commit/c929124f5ce2045da43314941d513b57ce9d553a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-12 Thread GitBox


SparkQA commented on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-840293326


   **[Test build #138478 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138478/testReport)**
 for PR 32494 at commit 
[`c929124`](https://github.com/apache/spark/commit/c929124f5ce2045da43314941d513b57ce9d553a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-12 Thread GitBox


AmplabJenkins commented on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-840292938


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138477/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-12 Thread GitBox


SparkQA removed a comment on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-840190243


   **[Test build #138477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138477/testReport)**
 for PR 32498 at commit 
[`0bb49b3`](https://github.com/apache/spark/commit/0bb49b3a15b4bf2c59916cce91d5aba285812079).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >