[GitHub] [spark] daugraph commented on pull request #34046: [SPARK-36804][YARN] Using the verbose parameter in yarn mode would cause application submission failure

2021-09-22 Thread GitBox


daugraph commented on pull request #34046:
URL: https://github.com/apache/spark/pull/34046#issuecomment-925517526


   ### Source code repository: 
   ```bash
   https://github.com/apache/spark.git -r 
4ea54e8672757c0dbe3dd57c81763afdffcbcc1b
   ```
   ### Submit script/config:
   ```bash
   export SPARK_PRINT_LAUNCH_COMMAND="1"
   export SPARK_PREPEND_CLASSES="1"
   export HADOOP_CONF_DIR=/path/to/hadoop/conf
   export SPARK_SUBMIT_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf"
   
   spark-submit \
   --master yarn \
   --deploy-mode cluster \
   --verbose \
   --conf spark.kerberos.keytab=/path/to/keytab/file \
   --conf spark.kerberos.principal=user_principal \
   --conf spark.yarn.queue=root.user_queue \
   --conf spark.yarn.maxAppAttempts=1 \
   --class com.example.Main \
   target/examples-1.0-SNAPSHOT.jar
   ```
   ### Output
   ```bash
   NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes 
ahead of assembly.
   Spark Command: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_271.jdk/Contents/Home/bin/java -cp 
/Users/lijianmeng/github/spark/conf/:/Users/lijianmeng/github/spark/common/kvstore/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/common/network-common/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/common/network-shuffle/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/common/network-yarn/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/common/sketch/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/common/tags/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/common/unsafe/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/core/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/examples/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/graphx/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/launcher/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/mllib/target/scala-2.12/classes/:/Users/lijian
 
meng/github/spark/repl/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/resource-managers/mesos/target/scala-2.12/classes:/Users/lijianmeng/github/spark/resource-managers/yarn/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/sql/catalyst/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/sql/core/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/sql/hive/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/sql/hive-thriftserver/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/streaming/target/scala-2.12/classes/:/Users/lijianmeng/github/spark/core/target/jars/*:/Users/lijianmeng/github/spark/mllib/target/jars/*:/Users/lijianmeng/github/spark/assembly/target/scala-2.12/jars/*:/path/to/hadoop/conf/
 -Djava.security.krb5.conf=/etc/krb5.conf org.apache.spark.deploy.SparkSubmit 
--master yarn --deploy-mode cluster --conf 
spark.kerberos.keytab=/path/to/keytab/file --conf spark.yarn.maxAppAttempts=1  
--conf spark.kerberos.principal=user_principa
 l --conf spark.yarn.queue=root.user_queue --class com.example.Main --verbose 
target/examples-1.0-SNAPSHOT.jar
   
   Using properties file: null
   Parsed arguments:
 master  yarn
 deployMode  cluster
 executorMemory  null
 executorCores   null
 totalExecutorCores  null
 propertiesFile  null
 driverMemorynull
 driverCores null
 driverExtraClassPathnull
 driverExtraLibraryPath  null
 driverExtraJavaOptions  null
 supervise   false
 queue   root.user_queue
 numExecutorsnull
 files   null
 pyFiles null
 archivesnull
 mainClass   com.example.Main
 primaryResource 
file:/Users/lijianmeng/bigdata/examples/target/examples-1.0-SNAPSHOT.jar
 namecom.example.Main
 childArgs   []
 jarsnull
 packagesnull
 packagesExclusions  null
 repositoriesnull
 verbose true
   
   Spark properties used, including those specified through
--conf and those from the properties file null:
 (spark.yarn.queue,root.user_queue)
 (spark.yarn.maxAppAttempts,1)
 (spark.kerberos.principal,user_principal)
 (spark.kerberos.keytab,/path/to/keytab/file)
   
   
   21/09/23 13:17:38 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Main class:
   org.apache.spark.deploy.yarn.YarnClusterApplication
   Arguments:
   --jar
   file:/Users/lijianmeng/bigdata/examples/target/examples-1.0-SNAPSHOT.jar
   --class
   com.example.Main
   --verbose
   Spark config:
   (spark.kerberos.keytab,/path/to/keytab/file)
   (spark.yarn.queue,root.user_queue)
   (spark.app.name,com.example.Main)
   

[GitHub] [spark] viirya commented on a change in pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-22 Thread GitBox


viirya commented on a change in pull request #34038:
URL: https://github.com/apache/spark/pull/34038#discussion_r714479944



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##
@@ -401,15 +401,30 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
 |the ${ordinalNumber(ti + 1)} table has 
${child.output.length} columns
   """.stripMargin.replace("\n", " ").trim())
   }
+  val isUnion = operator.isInstanceOf[Union]
+  val dataTypesAreCompatibleFn = if (isUnion) {
+// `TypeCoercion` takes care of type coercion already. If any 
columns or nested
+// columns are not compatible, we detect it here and throw 
analysis exception.
+val typeChecker = (dt1: DataType, dt2: DataType) => {
+  !TypeCoercion.findWiderTypeForTwo(dt1.asNullable, 
dt2.asNullable).isEmpty

Review comment:
   It is not always able to cast the types between children of union. For 
incompatible types, we need to find it out and throw analysis error here. Do I 
misunderstand it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #33627: [SPARK-36405][SQL][TESTS] Check that SQLSTATEs are valid

2021-09-22 Thread GitBox


HyukjinKwon closed pull request #33627:
URL: https://github.com/apache/spark/pull/33627


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33627: [SPARK-36405][SQL][TESTS] Check that SQLSTATEs are valid

2021-09-22 Thread GitBox


HyukjinKwon commented on pull request #33627:
URL: https://github.com/apache/spark/pull/33627#issuecomment-925515533


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.

2021-09-22 Thread GitBox


HyukjinKwon closed pull request #33844:
URL: https://github.com/apache/spark/pull/33844


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.

2021-09-22 Thread GitBox


HyukjinKwon commented on pull request #33844:
URL: https://github.com/apache/spark/pull/33844#issuecomment-925515119


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925513635


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48041/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925512991


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48042/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34072: [SPARK-36680][CATALYST] Supports Dynamic Table Options for Spark SQL

2021-09-22 Thread GitBox


HyukjinKwon commented on a change in pull request #34072:
URL: https://github.com/apache/spark/pull/34072#discussion_r714476788



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
##
@@ -1084,6 +1087,18 @@ class PlanParserSuite extends AnalysisTest {
   table("testcat", "db", "tab").select(star()).hint("BROADCAST", $"tab"))
   }
 
+  test("option hint") {

Review comment:
   let's add a JIRA prefix 
   
   ```suggestion
 test("SPARK-36680: option hint") {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34072: [SPARK-36680][CATALYST] Supports Dynamic Table Options for Spark SQL

2021-09-22 Thread GitBox


HyukjinKwon commented on a change in pull request #34072:
URL: https://github.com/apache/spark/pull/34072#discussion_r714462840



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -1244,15 +1245,21 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with SQLConfHelper with Logg
* }}}
*/
   override def visitTable(ctx: TableContext): LogicalPlan = withOrigin(ctx) {
-UnresolvedRelation(visitMultipartIdentifier(ctx.multipartIdentifier))
+val tableId = visitMultipartIdentifier(ctx.multipartIdentifier)
+val options = Option(ctx.optionHint).map(hint =>
+  visitPropertyKeyValues(hint.options)).getOrElse(Map.empty)
+UnresolvedRelation(tableId, new CaseInsensitiveStringMap(options.asJava))
   }
 
   /**
* Create an aliased table reference. This is typically used in FROM clauses.
*/
   override def visitTableName(ctx: TableNameContext): LogicalPlan = 
withOrigin(ctx) {
 val tableId = visitMultipartIdentifier(ctx.multipartIdentifier)
-val table = mayApplyAliasPlan(ctx.tableAlias, UnresolvedRelation(tableId))
+val options = Option(ctx.optionHint).map(hint =>
+  visitPropertyKeyValues(hint.options)).getOrElse(Map.empty)
+val table = mayApplyAliasPlan(ctx.tableAlias,
+  UnresolvedRelation(tableId, new 
CaseInsensitiveStringMap(options.asJava)))

Review comment:
   I don't think this works for the tables already defined with options 
because Spark respects the table properties defined in the table. This 
`options` is only for DSv2 for now. Can you add an e2e test, and see if it 
works? e.g.) create a table view and set the option.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2021-09-22 Thread GitBox


HyukjinKwon commented on a change in pull request #29535:
URL: https://github.com/apache/spark/pull/29535#discussion_r714476408



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
##
@@ -40,9 +41,12 @@ class UnresolvedException[TreeType <: TreeNode[_]](tree: 
TreeType, function: Str
  * Holds the name of a relation that has yet to be looked up in a catalog.
  *
  * @param multipartIdentifier table name
+ * @param options options to scan this relation. Only applicable to v2 table 
scan.

Review comment:
   okay, I just noticed 
https://github.com/apache/spark/commit/5e825482d70e13a8cb16f1fbdac8139710482d17 
added the merging behaviour for V1.
   
   Okay, maybe we should fix the comments here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2021-09-22 Thread GitBox


HyukjinKwon commented on a change in pull request #29535:
URL: https://github.com/apache/spark/pull/29535#discussion_r714476408



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
##
@@ -40,9 +41,12 @@ class UnresolvedException[TreeType <: TreeNode[_]](tree: 
TreeType, function: Str
  * Holds the name of a relation that has yet to be looked up in a catalog.
  *
  * @param multipartIdentifier table name
+ * @param options options to scan this relation. Only applicable to v2 table 
scan.

Review comment:
   okay, I just noticed 
https://github.com/apache/spark/commit/5e825482d70e13a8cb16f1fbdac8139710482d17 
added the merging behaviour.
   
   Okay, maybe we should fix the comments here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-22 Thread GitBox


cloud-fan commented on a change in pull request #34038:
URL: https://github.com/apache/spark/pull/34038#discussion_r714476343



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##
@@ -401,15 +401,30 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
 |the ${ordinalNumber(ti + 1)} table has 
${child.output.length} columns
   """.stripMargin.replace("\n", " ").trim())
   }
+  val isUnion = operator.isInstanceOf[Union]
+  val dataTypesAreCompatibleFn = if (isUnion) {
+// `TypeCoercion` takes care of type coercion already. If any 
columns or nested
+// columns are not compatible, we detect it here and throw 
analysis exception.
+val typeChecker = (dt1: DataType, dt2: DataType) => {
+  !TypeCoercion.findWiderTypeForTwo(dt1.asNullable, 
dt2.asNullable).isEmpty

Review comment:
   I know it's from the old code. But is it necessary? The analyzer can add 
implicit casts to make the types the same.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.

2021-09-22 Thread GitBox


AmplabJenkins removed a comment on pull request #33844:
URL: https://github.com/apache/spark/pull/33844#issuecomment-925512221


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48037/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.

2021-09-22 Thread GitBox


AmplabJenkins commented on pull request #33844:
URL: https://github.com/apache/spark/pull/33844#issuecomment-925512221


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48037/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.

2021-09-22 Thread GitBox


SparkQA commented on pull request #33844:
URL: https://github.com/apache/spark/pull/33844#issuecomment-925512194


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48037/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33627: [SPARK-36405] Check that SQLSTATEs are valid

2021-09-22 Thread GitBox


SparkQA commented on pull request #33627:
URL: https://github.com/apache/spark/pull/33627#issuecomment-925510613


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48039/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34036: [SPARK-36795][SQL] Explain Formatted has Duplicate Node IDs

2021-09-22 Thread GitBox


SparkQA commented on pull request #34036:
URL: https://github.com/apache/spark/pull/34036#issuecomment-925510108


   **[Test build #143534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143534/testReport)**
 for PR 34036 at commit 
[`c33b533`](https://github.com/apache/spark/commit/c33b5332262f132b3bdbd565b03436736f3e7a2f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34073: [SPARK-36760][SQL][FOLLOWUP] Add interface SupportsPushDownV2Filters

2021-09-22 Thread GitBox


SparkQA commented on pull request #34073:
URL: https://github.com/apache/spark/pull/34073#issuecomment-925510025


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48040/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2021-09-22 Thread GitBox


cloud-fan commented on pull request #29535:
URL: https://github.com/apache/spark/pull/29535#issuecomment-925509816


   > this creates a myth that setting options will overwrite table properties.
   
   This is expected. Per-scan options have higher priority than table 
properties.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


AmplabJenkins removed a comment on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925508882






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34046: [SPARK-36804][YARN] Using the verbose parameter in yarn mode would cause application submission failure

2021-09-22 Thread GitBox


SparkQA commented on pull request #34046:
URL: https://github.com/apache/spark/pull/34046#issuecomment-925509177


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48038/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ChenMichael commented on a change in pull request #34036: [SPARK-36795][SQL] Explain Formatted has Duplicate Node IDs

2021-09-22 Thread GitBox


ChenMichael commented on a change in pull request #34036:
URL: https://github.com/apache/spark/pull/34036#discussion_r714473203



##
File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
##
@@ -704,6 +704,31 @@ class ExplainSuiteAE extends ExplainSuiteHelper with 
EnableAdaptiveExecutionSuit
 "Bucketed: false (bucket column(s) not read)")
 }
   }
+
+  test("SPARK-36795: Node IDs should not be duplicated when InMemoryRelation 
Present") {
+withTempView("t1", "t2") {
+  Seq(1).toDF("k").write.saveAsTable("t1")
+  Seq(1).toDF("key").write.saveAsTable("t2")
+  spark.sql("SELECT * FROM t1").persist()
+  val query = "SELECT * FROM (SELECT * FROM t1) join t2 " +
+"ON k = t2.key"
+  val df = sql(query).toDF()
+
+  df.collect()
+  checkKeywordsExistsInExplain(df, FormattedMode,
+"""   * BroadcastHashJoin Inner BuildLeft (12)
+  |   :- BroadcastQueryStage (8)
+  |   :  +- BroadcastExchange (7)
+  |   : +- * Filter (6)
+  |   :+- * ColumnarToRow (5)

Review comment:
   Ok. changed the test to regex that extracts the node ids and asserts 
they are different.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


AmplabJenkins commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925508882






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #33903: [SPARK-36656][SQL][TEST] CollapseProject should not collapse correlated scalar subqueries

2021-09-22 Thread GitBox


cloud-fan commented on pull request #33903:
URL: https://github.com/apache/spark/pull/33903#issuecomment-925508631


   @allisonwang-db can you fix the conficts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #33990: [SPARK-36747][SQL] Do not collapse Project with Aggregate when correlated subqueries are present in the project list

2021-09-22 Thread GitBox


cloud-fan commented on pull request #33990:
URL: https://github.com/apache/spark/pull/33990#issuecomment-925508054


   @allisonwang-db can you open a backport PR for 3.2?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan edited a comment on pull request #33990: [SPARK-36747][SQL] Do not collapse Project with Aggregate when correlated subqueries are present in the project list

2021-09-22 Thread GitBox


cloud-fan edited a comment on pull request #33990:
URL: https://github.com/apache/spark/pull/33990#issuecomment-925507907


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #33990: [SPARK-36747][SQL] Do not collapse Project with Aggregate when correlated subqueries are present in the project list

2021-09-22 Thread GitBox


cloud-fan commented on pull request #33990:
URL: https://github.com/apache/spark/pull/33990#issuecomment-925507907


   thanks, merging to master/3.2!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #34043: [SPARK-36782][CORE] Avoid blocking dispatcher-BlockManagerMaster during UpdateBlockInfo

2021-09-22 Thread GitBox


Ngone51 commented on a change in pull request #34043:
URL: https://github.com/apache/spark/pull/34043#discussion_r714472130



##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -117,12 +117,15 @@ class BlockManagerMasterEndpoint(
 
 case _updateBlockInfo @
 UpdateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size) =>
-  val isSuccess = updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size)
-  context.reply(isSuccess)
-  // SPARK-30594: we should not post `SparkListenerBlockUpdated` when 
updateBlockInfo
-  // returns false since the block info would be updated again later.
-  if (isSuccess) {
-
listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))
+  val response = updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size)
+
+  response.foreach { isSuccess =>
+// SPARK-30594: we should not post `SparkListenerBlockUpdated` when 
updateBlockInfo
+// returns false since the block info would be updated again later.
+if (isSuccess) {
+  
listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))
+}
+context.reply(isSuccess)

Review comment:
   Sure!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a change in pull request #34043: [SPARK-36782][CORE] Avoid blocking dispatcher-BlockManagerMaster during UpdateBlockInfo

2021-09-22 Thread GitBox


mridulm commented on a change in pull request #34043:
URL: https://github.com/apache/spark/pull/34043#discussion_r714471870



##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -117,12 +117,15 @@ class BlockManagerMasterEndpoint(
 
 case _updateBlockInfo @
 UpdateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size) =>
-  val isSuccess = updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size)
-  context.reply(isSuccess)
-  // SPARK-30594: we should not post `SparkListenerBlockUpdated` when 
updateBlockInfo
-  // returns false since the block info would be updated again later.
-  if (isSuccess) {
-
listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))
+  val response = updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size)
+
+  response.foreach { isSuccess =>
+// SPARK-30594: we should not post `SparkListenerBlockUpdated` when 
updateBlockInfo
+// returns false since the block info would be updated again later.
+if (isSuccess) {
+  
listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))
+}
+context.reply(isSuccess)

Review comment:
   Given @gengliangwang has merged it, can you create a follow up PR ? We 
can merge it pretty quickly and possible make that into current 3.2 RC as well 
:)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang closed pull request #34043: [SPARK-36782][CORE] Avoid blocking dispatcher-BlockManagerMaster during UpdateBlockInfo

2021-09-22 Thread GitBox


gengliangwang closed pull request #34043:
URL: https://github.com/apache/spark/pull/34043


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #34043: [SPARK-36782][CORE] Avoid blocking dispatcher-BlockManagerMaster during UpdateBlockInfo

2021-09-22 Thread GitBox


gengliangwang commented on pull request #34043:
URL: https://github.com/apache/spark/pull/34043#issuecomment-925506975


   Merging to master/3.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #34043: [SPARK-36782][CORE] Avoid blocking dispatcher-BlockManagerMaster during UpdateBlockInfo

2021-09-22 Thread GitBox


gengliangwang commented on pull request #34043:
URL: https://github.com/apache/spark/pull/34043#issuecomment-925506883


   @mridulm @Ngone51 I really want to start 3.2.0 RC4 today.
   So I am going to merge this one and ask @Ngone51 to create a follow-up PR so 
that we can start the new RC soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #33990: [SPARK-36747][SQL] Do not collapse Project with Aggregate when correlated subqueries are present in the project list

2021-09-22 Thread GitBox


cloud-fan closed pull request #33990:
URL: https://github.com/apache/spark/pull/33990


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2021-09-22 Thread GitBox


HyukjinKwon commented on a change in pull request #29535:
URL: https://github.com/apache/spark/pull/29535#discussion_r714469347



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSuite.scala
##
@@ -186,4 +187,21 @@ class DataSourceV2DataFrameSuite
   assert(e3.getMessage.contains(s"Cannot use interval type in the table 
schema."))
 }
   }
+
+  test("options to scan v2 table should be passed to DataSourceV2Relation") {
+val t1 = "testcat.ns1.ns2.tbl"
+withTable(t1) {
+  val df1 = Seq((1L, "a"), (2L, "b"), (3L, "c")).toDF("id", "data")
+  df1.write.saveAsTable(t1)
+
+  val optionName = "fakeOption"
+  val df2 = spark.read
+.option(optionName, false)
+.table(t1)

Review comment:
   so for doubly sure, what happen if some options are already set in this 
table? e.g.)
   
   
   ```scala
   sql("CREATE TABLE tbl(a int) USING jdbc OPTIONS(a='b')")
   spark.option("a", "c").table("tbl")
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925504422


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48036/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2021-09-22 Thread GitBox


HyukjinKwon commented on pull request #29535:
URL: https://github.com/apache/spark/pull/29535#issuecomment-925504281


   this creates a myth that setting `options` will overwrite table properties. 
see also https://github.com/apache/spark/pull/34072


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2021-09-22 Thread GitBox


HyukjinKwon commented on pull request #29535:
URL: https://github.com/apache/spark/pull/29535#issuecomment-925504126


   So `UnresolvedReleation` is shared for both cases but conditionally use the 
`UnresolvedReleation.options` only for Scan? that's very confusing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2021-09-22 Thread GitBox


cloud-fan commented on pull request #29535:
URL: https://github.com/apache/spark/pull/29535#issuecomment-925503778


   it's table properties vs scan options


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a change in pull request #34043: [SPARK-36782][CORE] Avoid blocking dispatcher-BlockManagerMaster during UpdateBlockInfo

2021-09-22 Thread GitBox


mridulm commented on a change in pull request #34043:
URL: https://github.com/apache/spark/pull/34043#discussion_r714464726



##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -117,12 +117,15 @@ class BlockManagerMasterEndpoint(
 
 case _updateBlockInfo @
 UpdateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size) =>
-  val isSuccess = updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size)
-  context.reply(isSuccess)
-  // SPARK-30594: we should not post `SparkListenerBlockUpdated` when 
updateBlockInfo
-  // returns false since the block info would be updated again later.
-  if (isSuccess) {
-
listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))
+  val response = updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size)
+
+  response.foreach { isSuccess =>
+// SPARK-30594: we should not post `SparkListenerBlockUpdated` when 
updateBlockInfo
+// returns false since the block info would be updated again later.
+if (isSuccess) {
+  
listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))
+}
+context.reply(isSuccess)

Review comment:
   Did not realize this - thanks for pointing it out !
   So if I understood it right, the proposal is:
   
   ```
 def handleResult(success: Boolean): Unit = {
   if (success) {
 // post
   }
   context.reply(success)
 }
   
 if (blockId.isShuffle) {
   updateShuffleBlockInfo( ... ).foreach( handleResult(_))
 } else {
   handleResult(updateBlockInfo( ... ))
 }
   ```
   ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925502753


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48035/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #34043: [SPARK-36782][CORE] Avoid blocking dispatcher-BlockManagerMaster during UpdateBlockInfo

2021-09-22 Thread GitBox


Ngone51 commented on a change in pull request #34043:
URL: https://github.com/apache/spark/pull/34043#discussion_r714465842



##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -117,12 +117,15 @@ class BlockManagerMasterEndpoint(
 
 case _updateBlockInfo @
 UpdateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size) =>
-  val isSuccess = updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size)
-  context.reply(isSuccess)
-  // SPARK-30594: we should not post `SparkListenerBlockUpdated` when 
updateBlockInfo
-  // returns false since the block info would be updated again later.
-  if (isSuccess) {
-
listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))
+  val response = updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size)
+
+  response.foreach { isSuccess =>
+// SPARK-30594: we should not post `SparkListenerBlockUpdated` when 
updateBlockInfo
+// returns false since the block info would be updated again later.
+if (isSuccess) {
+  
listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))
+}
+context.reply(isSuccess)

Review comment:
   Yes!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2021-09-22 Thread GitBox


HyukjinKwon commented on pull request #29535:
URL: https://github.com/apache/spark/pull/29535#issuecomment-925500304


   wait, I get confused here. We already defined a table with options. How does 
it work with the newly set options? are they merged?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34072: [SPARK-36680][CATALYST] Supports Dynamic Table Options for Spark SQL

2021-09-22 Thread GitBox


HyukjinKwon commented on a change in pull request #34072:
URL: https://github.com/apache/spark/pull/34072#discussion_r714462840



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -1244,15 +1245,21 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with SQLConfHelper with Logg
* }}}
*/
   override def visitTable(ctx: TableContext): LogicalPlan = withOrigin(ctx) {
-UnresolvedRelation(visitMultipartIdentifier(ctx.multipartIdentifier))
+val tableId = visitMultipartIdentifier(ctx.multipartIdentifier)
+val options = Option(ctx.optionHint).map(hint =>
+  visitPropertyKeyValues(hint.options)).getOrElse(Map.empty)
+UnresolvedRelation(tableId, new CaseInsensitiveStringMap(options.asJava))
   }
 
   /**
* Create an aliased table reference. This is typically used in FROM clauses.
*/
   override def visitTableName(ctx: TableNameContext): LogicalPlan = 
withOrigin(ctx) {
 val tableId = visitMultipartIdentifier(ctx.multipartIdentifier)
-val table = mayApplyAliasPlan(ctx.tableAlias, UnresolvedRelation(tableId))
+val options = Option(ctx.optionHint).map(hint =>
+  visitPropertyKeyValues(hint.options)).getOrElse(Map.empty)
+val table = mayApplyAliasPlan(ctx.tableAlias,
+  UnresolvedRelation(tableId, new 
CaseInsensitiveStringMap(options.asJava)))

Review comment:
   I don't think this works for the tables already defined with options 
because Spark respects the table properties defined in the table. This 
`options` is only for DSv2 for now. Can you add an e2e test, and see if it 
works? e.g.) create a table view and set the option.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34046: [SPARK-36804][YARN] Using the verbose parameter in yarn mode would cause application submission failure

2021-09-22 Thread GitBox


AmplabJenkins removed a comment on pull request #34046:
URL: https://github.com/apache/spark/pull/34046#issuecomment-925497981


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143530/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #34046: [SPARK-36804][YARN] Using the verbose parameter in yarn mode would cause application submission failure

2021-09-22 Thread GitBox


SparkQA removed a comment on pull request #34046:
URL: https://github.com/apache/spark/pull/34046#issuecomment-925493213


   **[Test build #143530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143530/testReport)**
 for PR 34046 at commit 
[`80b24bd`](https://github.com/apache/spark/commit/80b24bdb8a4dd7cf2b46563d4708f9abdff0e540).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34046: [SPARK-36804][YARN] Using the verbose parameter in yarn mode would cause application submission failure

2021-09-22 Thread GitBox


AmplabJenkins commented on pull request #34046:
URL: https://github.com/apache/spark/pull/34046#issuecomment-925497981


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143530/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34046: [SPARK-36804][YARN] Using the verbose parameter in yarn mode would cause application submission failure

2021-09-22 Thread GitBox


SparkQA commented on pull request #34046:
URL: https://github.com/apache/spark/pull/34046#issuecomment-925497956


   **[Test build #143530 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143530/testReport)**
 for PR 34046 at commit 
[`80b24bd`](https://github.com/apache/spark/commit/80b24bdb8a4dd7cf2b46563d4708f9abdff0e540).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925497597


   **[Test build #143533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143533/testReport)**
 for PR 34033 at commit 
[`293daea`](https://github.com/apache/spark/commit/293daea9674bb06606dbdd188b6730797de2f617).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #34030: [SPARK-36790][SQL] Update user-facing catalog to adapt CatalogPlugin

2021-09-22 Thread GitBox


huaxingao commented on pull request #34030:
URL: https://github.com/apache/spark/pull/34030#issuecomment-925495648


   > Another question is, do we need to add more function overloads with an 
extra catalog parameter?
   
   Agree not to add more function overloading.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


AngersZh commented on a change in pull request #34033:
URL: https://github.com/apache/spark/pull/34033#discussion_r714460173



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
##
@@ -562,6 +567,8 @@ case class InSet(child: Expression, hset: Set[Any]) extends 
UnaryExpression with
   protected override def nullSafeEval(value: Any): Any = {
 if (set.contains(value)) {
   true
+} else if (isNaN(value)) {
+  set.exists(isNaN(_))

Review comment:
   How about current?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34073: [SPARK-36760][SQL][FOLLOWUP] Add interface SupportsPushDownV2Filters

2021-09-22 Thread GitBox


SparkQA commented on pull request #34073:
URL: https://github.com/apache/spark/pull/34073#issuecomment-925495331


   **[Test build #143532 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143532/testReport)**
 for PR 34073 at commit 
[`3a0052f`](https://github.com/apache/spark/commit/3a0052f2830ae1f31b92f0e2847937a359145477).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.

2021-09-22 Thread GitBox


SparkQA commented on pull request #33844:
URL: https://github.com/apache/spark/pull/33844#issuecomment-925494603


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48037/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33627: [SPARK-36405] Check that SQLSTATEs are valid

2021-09-22 Thread GitBox


SparkQA commented on pull request #33627:
URL: https://github.com/apache/spark/pull/33627#issuecomment-925493660


   **[Test build #143531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143531/testReport)**
 for PR 33627 at commit 
[`1877bc4`](https://github.com/apache/spark/commit/1877bc48de3087134edbed6d3e45f20d4be3ba7d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34046: [SPARK-36804][YARN] Using the verbose parameter in yarn mode would cause application submission failure

2021-09-22 Thread GitBox


AmplabJenkins removed a comment on pull request #34046:
URL: https://github.com/apache/spark/pull/34046#issuecomment-922637790


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34046: [SPARK-36804][YARN] Using the verbose parameter in yarn mode would cause application submission failure

2021-09-22 Thread GitBox


SparkQA commented on pull request #34046:
URL: https://github.com/apache/spark/pull/34046#issuecomment-925493213


   **[Test build #143530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143530/testReport)**
 for PR 34046 at commit 
[`80b24bd`](https://github.com/apache/spark/commit/80b24bdb8a4dd7cf2b46563d4708f9abdff0e540).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-22 Thread GitBox


AmplabJenkins removed a comment on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-925490991


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.

2021-09-22 Thread GitBox


AmplabJenkins removed a comment on pull request #33844:
URL: https://github.com/apache/spark/pull/33844#issuecomment-925490993


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143529/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34069: [SPARK-36823][SQL] Support broadcast nested loop join hint for equi-join

2021-09-22 Thread GitBox


AmplabJenkins removed a comment on pull request #34069:
URL: https://github.com/apache/spark/pull/34069#issuecomment-925490994


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48032/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34058: [SPARK-36711][PYTHON] Support multi-index in new syntax

2021-09-22 Thread GitBox


AmplabJenkins removed a comment on pull request #34058:
URL: https://github.com/apache/spark/pull/34058#issuecomment-925490992


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48033/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.

2021-09-22 Thread GitBox


AmplabJenkins commented on pull request #33844:
URL: https://github.com/apache/spark/pull/33844#issuecomment-925490993


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143529/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34058: [SPARK-36711][PYTHON] Support multi-index in new syntax

2021-09-22 Thread GitBox


AmplabJenkins commented on pull request #34058:
URL: https://github.com/apache/spark/pull/34058#issuecomment-925490992


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48033/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34069: [SPARK-36823][SQL] Support broadcast nested loop join hint for equi-join

2021-09-22 Thread GitBox


AmplabJenkins commented on pull request #34069:
URL: https://github.com/apache/spark/pull/34069#issuecomment-925490994


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48032/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-22 Thread GitBox


AmplabJenkins commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-925490991


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925489997


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48036/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34058: [SPARK-36711][PYTHON] Support multi-index in new syntax

2021-09-22 Thread GitBox


SparkQA commented on pull request #34058:
URL: https://github.com/apache/spark/pull/34058#issuecomment-925489894


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48033/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.

2021-09-22 Thread GitBox


SparkQA removed a comment on pull request #33844:
URL: https://github.com/apache/spark/pull/33844#issuecomment-925480993


   **[Test build #143529 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143529/testReport)**
 for PR 33844 at commit 
[`90e7ae9`](https://github.com/apache/spark/commit/90e7ae9510345f8be6aa08d2e28eacf024cb1264).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.

2021-09-22 Thread GitBox


SparkQA commented on pull request #33844:
URL: https://github.com/apache/spark/pull/33844#issuecomment-925488482


   **[Test build #143529 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143529/testReport)**
 for PR 33844 at commit 
[`90e7ae9`](https://github.com/apache/spark/commit/90e7ae9510345f8be6aa08d2e28eacf024cb1264).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34069: [SPARK-36823][SQL] Support broadcast nested loop join hint for equi-join

2021-09-22 Thread GitBox


SparkQA commented on pull request #34069:
URL: https://github.com/apache/spark/pull/34069#issuecomment-925488226


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48032/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925488150


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48035/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #34046: [SPARK-36804][YARN] Using the verbose parameter in yarn mode would cause application submission failure

2021-09-22 Thread GitBox


HyukjinKwon commented on pull request #34046:
URL: https://github.com/apache/spark/pull/34046#issuecomment-925487408


   ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #34031: [SPARK-36791][DOCS] Fix spelling mistakes in running-on-yarn.md file where JHS_POST should be JHS_HOST

2021-09-22 Thread GitBox


HyukjinKwon closed pull request #34031:
URL: https://github.com/apache/spark/pull/34031


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #34031: [SPARK-36791][DOCS] Fix spelling mistakes in running-on-yarn.md file where JHS_POST should be JHS_HOST

2021-09-22 Thread GitBox


HyukjinKwon commented on pull request #34031:
URL: https://github.com/apache/spark/pull/34031#issuecomment-925486202


   Merged to master, branch-3.2, banch-3.1, and branch-3.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34058: [SPARK-36711][PYTHON] Support multi-index in new syntax

2021-09-22 Thread GitBox


HyukjinKwon commented on a change in pull request #34058:
URL: https://github.com/apache/spark/pull/34058#discussion_r714452391



##
File path: python/pyspark/pandas/typedef/typehints.py
##
@@ -673,98 +673,146 @@ def create_tuple_for_frame_type(params: Any) -> object:
 Typing data columns with an index:
 
 >>> ps.DataFrame[int, [int, int]]  # doctest: +ELLIPSIS
-typing.Tuple[...IndexNameType, int, int]
+typing.Tuple[...IndexNameType, ...NameType, ...NameType]
 >>> ps.DataFrame[pdf.index.dtype, pdf.dtypes]  # doctest: +ELLIPSIS
-typing.Tuple[...IndexNameType, numpy.int64]
+typing.Tuple[...IndexNameType, ...NameType]
 >>> ps.DataFrame[("index", int), [("id", int), ("A", int)]]  # 
doctest: +ELLIPSIS
 typing.Tuple[...IndexNameType, ...NameType, ...NameType]
 >>> ps.DataFrame[(pdf.index.name, pdf.index.dtype), zip(pdf.columns, 
pdf.dtypes)]
 ... # doctest: +ELLIPSIS
 typing.Tuple[...IndexNameType, ...NameType]
+
+Typing data columns with an Multi-index:
+>>> arrays = [[1, 1, 2], ['red', 'blue', 'red']]
+>>> idx = pd.MultiIndex.from_arrays(arrays, names=('number', 'color'))
+>>> pdf = pd.DataFrame({'a': range(3)}, index=idx)
+>>> ps.DataFrame[[int, int], [int, int]]  # doctest: +ELLIPSIS
+typing.Tuple[...IndexNameType, ...IndexNameType, ...NameType, 
...NameType]
+>>> ps.DataFrame[pdf.index.dtypes, pdf.dtypes]  # doctest: +ELLIPSIS
+typing.Tuple[...IndexNameType, ...NameType]
+>>> ps.DataFrame[[("index-1", int), ("index-2", int)], [("id", int), 
("A", int)]]
+... # doctest: +ELLIPSIS
+typing.Tuple[...IndexNameType, ...IndexNameType, ...NameType, 
...NameType]
+>>> ps.DataFrame[zip(pdf.index.names, pdf.index.dtypes), 
zip(pdf.columns, pdf.dtypes)]
+... # doctest: +ELLIPSIS
+typing.Tuple[...IndexNameType, ...NameType]
 """
 return Tuple[extract_types(params)]
 
 
 # TODO(SPARK-36708): numpy.typing (numpy 1.21+) support for nested types.
 def extract_types(params: Any) -> Tuple:
 origin = params
-if isinstance(params, zip):  # type: ignore
-# Example:
-#   DataFrame[zip(pdf.columns, pdf.dtypes)]
-params = tuple(slice(name, tpe) for name, tpe in params)  # type: 
ignore
 
-if isinstance(params, Iterable):
-params = tuple(params)
-else:
-params = (params,)
+params = _prepare_a_tuple(params)
 
-if all(
-isinstance(param, slice)
-and param.start is not None
-and param.step is None
-and param.stop is not None
-for param in params
-):
+if _is_valid_slices(params):
 # Example:
 #   DataFrame["id": int, "A": int]
-new_params = []
-for param in params:
-new_param = type("NameType", (NameTypeHolder,), {})  # type: 
Type[NameTypeHolder]
-new_param.name = param.start
-# When the given argument is a numpy's dtype instance.
-new_param.tpe = param.stop.type if isinstance(param.stop, 
np.dtype) else param.stop
-new_params.append(new_param)
-
+new_params = _convert_slices_to_holders(params, is_index=False)
 return tuple(new_params)
 elif len(params) == 2 and isinstance(params[1], (zip, list, pd.Series)):
 # Example:
 #   DataFrame[int, [int, int]]
 #   DataFrame[pdf.index.dtype, pdf.dtypes]
 #   DataFrame[("index", int), [("id", int), ("A", int)]]
 #   DataFrame[(pdf.index.name, pdf.index.dtype), zip(pdf.columns, 
pdf.dtypes)]
+#
+#   DataFrame[[int, int], [int, int]]
+#   DataFrame[pdf.index.dtypes, pdf.dtypes]
+#   DataFrame[[("index", int), ("index-2", int)], [("id", int), ("A", 
int)]]
+#   DataFrame[zip(pdf.index.names, pdf.index.dtypes), zip(pdf.columns, 
pdf.dtypes)]

Review comment:
   okie, sounds good to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-22 Thread GitBox


SparkQA commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-925485874


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34058: [SPARK-36711][PYTHON] Support multi-index in new syntax

2021-09-22 Thread GitBox


HyukjinKwon commented on a change in pull request #34058:
URL: https://github.com/apache/spark/pull/34058#discussion_r714452107



##
File path: python/pyspark/pandas/typedef/typehints.py
##
@@ -673,98 +673,146 @@ def create_tuple_for_frame_type(params: Any) -> object:
 Typing data columns with an index:
 
 >>> ps.DataFrame[int, [int, int]]  # doctest: +ELLIPSIS
-typing.Tuple[...IndexNameType, int, int]
+typing.Tuple[...IndexNameType, ...NameType, ...NameType]
 >>> ps.DataFrame[pdf.index.dtype, pdf.dtypes]  # doctest: +ELLIPSIS
-typing.Tuple[...IndexNameType, numpy.int64]
+typing.Tuple[...IndexNameType, ...NameType]
 >>> ps.DataFrame[("index", int), [("id", int), ("A", int)]]  # 
doctest: +ELLIPSIS
 typing.Tuple[...IndexNameType, ...NameType, ...NameType]
 >>> ps.DataFrame[(pdf.index.name, pdf.index.dtype), zip(pdf.columns, 
pdf.dtypes)]
 ... # doctest: +ELLIPSIS
 typing.Tuple[...IndexNameType, ...NameType]
+
+Typing data columns with an Multi-index:
+>>> arrays = [[1, 1, 2], ['red', 'blue', 'red']]
+>>> idx = pd.MultiIndex.from_arrays(arrays, names=('number', 'color'))
+>>> pdf = pd.DataFrame({'a': range(3)}, index=idx)
+>>> ps.DataFrame[[int, int], [int, int]]  # doctest: +ELLIPSIS
+typing.Tuple[...IndexNameType, ...IndexNameType, ...NameType, 
...NameType]
+>>> ps.DataFrame[pdf.index.dtypes, pdf.dtypes]  # doctest: +ELLIPSIS
+typing.Tuple[...IndexNameType, ...NameType]
+>>> ps.DataFrame[[("index-1", int), ("index-2", int)], [("id", int), 
("A", int)]]
+... # doctest: +ELLIPSIS
+typing.Tuple[...IndexNameType, ...IndexNameType, ...NameType, 
...NameType]
+>>> ps.DataFrame[zip(pdf.index.names, pdf.index.dtypes), 
zip(pdf.columns, pdf.dtypes)]
+... # doctest: +ELLIPSIS
+typing.Tuple[...IndexNameType, ...NameType]
 """
 return Tuple[extract_types(params)]
 
 
 # TODO(SPARK-36708): numpy.typing (numpy 1.21+) support for nested types.
 def extract_types(params: Any) -> Tuple:
 origin = params
-if isinstance(params, zip):  # type: ignore
-# Example:
-#   DataFrame[zip(pdf.columns, pdf.dtypes)]
-params = tuple(slice(name, tpe) for name, tpe in params)  # type: 
ignore
 
-if isinstance(params, Iterable):
-params = tuple(params)
-else:
-params = (params,)
+params = _prepare_a_tuple(params)
 
-if all(
-isinstance(param, slice)
-and param.start is not None
-and param.step is None
-and param.stop is not None
-for param in params
-):
+if _is_valid_slices(params):
 # Example:
 #   DataFrame["id": int, "A": int]
-new_params = []
-for param in params:
-new_param = type("NameType", (NameTypeHolder,), {})  # type: 
Type[NameTypeHolder]
-new_param.name = param.start
-# When the given argument is a numpy's dtype instance.
-new_param.tpe = param.stop.type if isinstance(param.stop, 
np.dtype) else param.stop
-new_params.append(new_param)
-
+new_params = _convert_slices_to_holders(params, is_index=False)
 return tuple(new_params)
 elif len(params) == 2 and isinstance(params[1], (zip, list, pd.Series)):
 # Example:
 #   DataFrame[int, [int, int]]
 #   DataFrame[pdf.index.dtype, pdf.dtypes]
 #   DataFrame[("index", int), [("id", int), ("A", int)]]
 #   DataFrame[(pdf.index.name, pdf.index.dtype), zip(pdf.columns, 
pdf.dtypes)]
+#
+#   DataFrame[[int, int], [int, int]]
+#   DataFrame[pdf.index.dtypes, pdf.dtypes]

Review comment:
   ohh okay, its for dtype*s*

##
File path: python/pyspark/pandas/typedef/typehints.py
##
@@ -673,98 +673,146 @@ def create_tuple_for_frame_type(params: Any) -> object:
 Typing data columns with an index:
 
 >>> ps.DataFrame[int, [int, int]]  # doctest: +ELLIPSIS
-typing.Tuple[...IndexNameType, int, int]
+typing.Tuple[...IndexNameType, ...NameType, ...NameType]
 >>> ps.DataFrame[pdf.index.dtype, pdf.dtypes]  # doctest: +ELLIPSIS
-typing.Tuple[...IndexNameType, numpy.int64]
+typing.Tuple[...IndexNameType, ...NameType]
 >>> ps.DataFrame[("index", int), [("id", int), ("A", int)]]  # 
doctest: +ELLIPSIS
 typing.Tuple[...IndexNameType, ...NameType, ...NameType]
 >>> ps.DataFrame[(pdf.index.name, pdf.index.dtype), zip(pdf.columns, 
pdf.dtypes)]
 ... # doctest: +ELLIPSIS
 typing.Tuple[...IndexNameType, ...NameType]
+
+Typing data columns with an Multi-index:
+>>> arrays = [[1, 1, 2], ['red', 'blue', 'red']]
+>>> idx = pd.MultiIndex.from_arrays(arrays, names=('number', 'color'))
+>>> pdf = pd.DataFrame({'a': range(3)}, index=idx)
+>>> ps.DataFrame[[int, int], 

[GitHub] [spark] HyukjinKwon commented on a change in pull request #34058: [SPARK-36711][PYTHON] Support multi-index in new syntax

2021-09-22 Thread GitBox


HyukjinKwon commented on a change in pull request #34058:
URL: https://github.com/apache/spark/pull/34058#discussion_r714451936



##
File path: python/pyspark/pandas/typedef/typehints.py
##
@@ -673,98 +673,146 @@ def create_tuple_for_frame_type(params: Any) -> object:
 Typing data columns with an index:
 
 >>> ps.DataFrame[int, [int, int]]  # doctest: +ELLIPSIS
-typing.Tuple[...IndexNameType, int, int]
+typing.Tuple[...IndexNameType, ...NameType, ...NameType]
 >>> ps.DataFrame[pdf.index.dtype, pdf.dtypes]  # doctest: +ELLIPSIS
-typing.Tuple[...IndexNameType, numpy.int64]
+typing.Tuple[...IndexNameType, ...NameType]
 >>> ps.DataFrame[("index", int), [("id", int), ("A", int)]]  # 
doctest: +ELLIPSIS
 typing.Tuple[...IndexNameType, ...NameType, ...NameType]
 >>> ps.DataFrame[(pdf.index.name, pdf.index.dtype), zip(pdf.columns, 
pdf.dtypes)]
 ... # doctest: +ELLIPSIS
 typing.Tuple[...IndexNameType, ...NameType]
+
+Typing data columns with an Multi-index:
+>>> arrays = [[1, 1, 2], ['red', 'blue', 'red']]
+>>> idx = pd.MultiIndex.from_arrays(arrays, names=('number', 'color'))
+>>> pdf = pd.DataFrame({'a': range(3)}, index=idx)
+>>> ps.DataFrame[[int, int], [int, int]]  # doctest: +ELLIPSIS
+typing.Tuple[...IndexNameType, ...IndexNameType, ...NameType, 
...NameType]
+>>> ps.DataFrame[pdf.index.dtypes, pdf.dtypes]  # doctest: +ELLIPSIS
+typing.Tuple[...IndexNameType, ...NameType]
+>>> ps.DataFrame[[("index-1", int), ("index-2", int)], [("id", int), 
("A", int)]]
+... # doctest: +ELLIPSIS
+typing.Tuple[...IndexNameType, ...IndexNameType, ...NameType, 
...NameType]
+>>> ps.DataFrame[zip(pdf.index.names, pdf.index.dtypes), 
zip(pdf.columns, pdf.dtypes)]
+... # doctest: +ELLIPSIS
+typing.Tuple[...IndexNameType, ...NameType]
 """
 return Tuple[extract_types(params)]
 
 
 # TODO(SPARK-36708): numpy.typing (numpy 1.21+) support for nested types.
 def extract_types(params: Any) -> Tuple:
 origin = params
-if isinstance(params, zip):  # type: ignore
-# Example:
-#   DataFrame[zip(pdf.columns, pdf.dtypes)]
-params = tuple(slice(name, tpe) for name, tpe in params)  # type: 
ignore
 
-if isinstance(params, Iterable):
-params = tuple(params)
-else:
-params = (params,)
+params = _prepare_a_tuple(params)
 
-if all(
-isinstance(param, slice)
-and param.start is not None
-and param.step is None
-and param.stop is not None
-for param in params
-):
+if _is_valid_slices(params):
 # Example:
 #   DataFrame["id": int, "A": int]
-new_params = []
-for param in params:
-new_param = type("NameType", (NameTypeHolder,), {})  # type: 
Type[NameTypeHolder]
-new_param.name = param.start
-# When the given argument is a numpy's dtype instance.
-new_param.tpe = param.stop.type if isinstance(param.stop, 
np.dtype) else param.stop
-new_params.append(new_param)
-
+new_params = _convert_slices_to_holders(params, is_index=False)
 return tuple(new_params)
 elif len(params) == 2 and isinstance(params[1], (zip, list, pd.Series)):
 # Example:
 #   DataFrame[int, [int, int]]
 #   DataFrame[pdf.index.dtype, pdf.dtypes]
 #   DataFrame[("index", int), [("id", int), ("A", int)]]
 #   DataFrame[(pdf.index.name, pdf.index.dtype), zip(pdf.columns, 
pdf.dtypes)]
+#
+#   DataFrame[[int, int], [int, int]]
+#   DataFrame[pdf.index.dtypes, pdf.dtypes]
+#   DataFrame[[("index", int), ("index-2", int)], [("id", int), ("A", 
int)]]
+#   DataFrame[zip(pdf.index.names, pdf.index.dtypes), zip(pdf.columns, 
pdf.dtypes)]

Review comment:
   I meant:
   
   ```
   ps.DataFrame[(pdf.index.names, pdf.index.dtypes), zip(pdf.columns, 
pdf.dtypes)]
   ```
   
   :-).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.

2021-09-22 Thread GitBox


HyukjinKwon commented on pull request #33844:
URL: https://github.com/apache/spark/pull/33844#issuecomment-925485282


   @itholic mind updating Pr description too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #33989: [SPARK-36676][SQL][BUILD] Create shaded Hive module and upgrade Guava version to 30.1.1-jre

2021-09-22 Thread GitBox


HyukjinKwon commented on a change in pull request #33989:
URL: https://github.com/apache/spark/pull/33989#discussion_r714448159



##
File path: assembly/pom.xml
##
@@ -165,6 +169,13 @@
 
   hive
   
+
+
+  org.apache.spark
+  spark-hive-shaded_${scala.binary.version}
+  ${project.version}
+  ${hive.deps.scope}

Review comment:
   @sunchao sorry if I missed sth but why should redeclare here? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.

2021-09-22 Thread GitBox


SparkQA commented on pull request #33844:
URL: https://github.com/apache/spark/pull/33844#issuecomment-925480993


   **[Test build #143529 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143529/testReport)**
 for PR 33844 at commit 
[`90e7ae9`](https://github.com/apache/spark/commit/90e7ae9510345f8be6aa08d2e28eacf024cb1264).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] daugraph commented on a change in pull request #34046: [SPARK-36804][YARN] Using the verbose parameter in yarn mode would cause application submission failure

2021-09-22 Thread GitBox


daugraph commented on a change in pull request #34046:
URL: https://github.com/apache/spark/pull/34046#discussion_r714446941



##
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala
##
@@ -66,6 +74,15 @@ private[spark] class ClientArguments(args: Array[String]) {
   throw new IllegalArgumentException("Cannot have primary-py-file and 
primary-r-file" +
 " at the same time")
 }
+
+if (verbose) {
+  logInfo("Client arguments for YARN application:")

Review comment:
   we can also avoid throw Expection by remove --verbose option before pass 
arguments to org.apache.spark.deploy.yarn.ClientArgument.scala




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


cloud-fan commented on a change in pull request #34033:
URL: https://github.com/apache/spark/pull/34033#discussion_r714446914



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
##
@@ -562,6 +567,8 @@ case class InSet(child: Expression, hset: Set[Any]) extends 
UnaryExpression with
   protected override def nullSafeEval(value: Any): Any = {
 if (set.contains(value)) {
   true
+} else if (isNaN(value)) {
+  set.exists(isNaN(_))

Review comment:
   can we have a `hasNaN` variable to avoid repeated computing?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sigmod commented on a change in pull request #34053: [SPARK-36813][SQL][PYTHON] Propose an infrastructure of as-of join and imlement ps.merge_asof

2021-09-22 Thread GitBox


sigmod commented on a change in pull request #34053:
URL: https://github.com/apache/spark/pull/34053#discussion_r714446686



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAsOfJoinSuite.scala
##
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanHelper
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types._
+
+class DataFrameAsOfJoinSuite extends QueryTest
+  with SharedSparkSession
+  with AdaptiveSparkPlanHelper {
+
+  def prepareForAsOfJoin(): (DataFrame, DataFrame) = {
+val schema1 = StructType(
+  StructField("a", IntegerType, false) ::
+StructField("b", StringType, false) ::
+StructField("left_val", StringType, false) :: Nil)
+val rowSeq1: List[Row] = List(Row(1, "x", "a"), Row(5, "y", "b"), Row(10, 
"z", "c"))
+val df1 = spark.createDataFrame(rowSeq1.asJava, schema1)
+
+val schema2 = StructType(
+  StructField("a", IntegerType) ::
+StructField("b", StringType) ::
+StructField("right_val", IntegerType) :: Nil)
+val rowSeq2: List[Row] = List(Row(1, "v", 1), Row(2, "w", 2), Row(3, "x", 
3),
+  Row(6, "y", 6), Row(7, "z", 7))
+val df2 = spark.createDataFrame(rowSeq2.asJava, schema2)
+
+(df1, df2)
+  }
+
+  test("as-of join - simple") {
+val (df1, df2) = prepareForAsOfJoin()
+checkAnswer(
+  df1.joinAsOf(
+df2, df1.col("a"), df2.col("a"), usingColumns = Seq.empty,
+joinType = "left", tolerance = null, allowExactMatches = true, 
direction = "backward"),
+  Seq(
+Row(1, "x", "a", 1, "v", 1),
+Row(5, "y", "b", 3, "x", 3),
+Row(10, "z", "c", 7, "z", 7)
+  )
+)
+  }
+
+  test("as-of join - usingColumns") {
+val (df1, df2) = prepareForAsOfJoin()
+checkAnswer(
+  df1.joinAsOf(df2, df1.col("a"), df2.col("a"), usingColumns = Seq("b"),
+joinType = "left", tolerance = null, allowExactMatches = true, 
direction = "backward"),
+  Seq(
+Row(1, "x", "a", null, null, null),
+Row(5, "y", "b", null, null, null),
+Row(10, "z", "c", 7, "z", 7)
+  )
+)
+  }
+
+  test("as-of join - usingColumns, inner") {
+val (df1, df2) = prepareForAsOfJoin()
+checkAnswer(
+  df1.joinAsOf(df2, df1.col("a"), df2.col("a"), usingColumns = Seq("b"),
+joinType = "inner", tolerance = null, allowExactMatches = true, 
direction = "backward"),
+  Seq(
+Row(10, "z", "c", 7, "z", 7)
+  )
+)
+  }
+
+  test("as-of join - tolerance = 1") {
+val (df1, df2) = prepareForAsOfJoin()
+checkAnswer(
+  df1.joinAsOf(df2, df1.col("a"), df2.col("a"), usingColumns = Seq.empty,
+joinType = "left", tolerance = lit(1), allowExactMatches = true, 
direction = "backward"),
+  Seq(
+Row(1, "x", "a", 1, "v", 1),
+Row(5, "y", "b", null, null, null),
+Row(10, "z", "c", null, null, null)
+  )
+)
+  }
+
+  test("as-of join - allowExactMatches = false") {
+val (df1, df2) = prepareForAsOfJoin()
+checkAnswer(
+  df1.joinAsOf(df2, df1.col("a"), df2.col("a"), usingColumns = Seq.empty,
+joinType = "left", tolerance = null, allowExactMatches = false, 
direction = "backward"),
+  Seq(
+Row(1, "x", "a", null, null, null),

Review comment:
   In the examples in comments, non-matches' numeric columns are NaN?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sigmod commented on a change in pull request #34053: [SPARK-36813][SQL][PYTHON] Propose an infrastructure of as-of join and imlement ps.merge_asof

2021-09-22 Thread GitBox


sigmod commented on a change in pull request #34053:
URL: https://github.com/apache/spark/pull/34053#discussion_r714443723



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##
@@ -2122,6 +2125,68 @@ object RewriteIntersectAll extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Replaces logical [[AsOfJoin]] operator using a combination of Join and 
Aggregate operator.
+ *
+ * Input Pseudo-Query:
+ * {{{
+ *SELECT * FROM left ASOF JOIN right ON (condition, as_of on(left.t, 
right.t), tolerance)
+ * }}}
+ *
+ * Rewritten Query:
+ * {{{
+ *   SELECT left.*, __right__.*
+ *   FROM (
+ *SELECT
+ * left.*,
+ * (
+ *  SELECT MIN_BY(STRUCT(right.*), left.t - right.t)
+ *  FROM right
+ *  WHERE condition AND left.t >= right.t AND right.t >= 
left.t - tolerance
+ * ) as __right__
+ *FROM left
+ *)
+ * }}}
+ */
+object RewriteAsOfJoin extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning(
+_.containsPattern(AS_OF_JOIN), ruleId) {
+case AsOfJoin(left, right, asOfCondition, condition, orderExpression, 
joinType) =>
+  val conditionWithOuterReference =
+condition.map(And(_, 
asOfCondition)).getOrElse(asOfCondition).transformUp {
+  case a: AttributeReference if left.outputSet.contains(a) =>
+OuterReference(a)
+  }
+  val filtered = Filter(conditionWithOuterReference, right)
+
+  val orderExpressionWithOuterReference = orderExpression.transformUp {
+  case a: AttributeReference if left.outputSet.contains(a) =>
+OuterReference(a)
+}
+  val rightStruct = CreateStruct(right.output)
+  val nearestRight = MinBy(rightStruct, orderExpressionWithOuterReference)
+.toAggregateExpression()
+  val aggExpr = Alias(nearestRight, "__nearest_right__")()
+  val aggregate = Aggregate(Seq.empty, Seq(aggExpr), filtered)
+
+  val scalarSubquery = Project(

Review comment:
   Nit: projectWithScalarSubquery ?

##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAsOfJoinSuite.scala
##
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanHelper
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types._
+
+class DataFrameAsOfJoinSuite extends QueryTest
+  with SharedSparkSession
+  with AdaptiveSparkPlanHelper {
+
+  def prepareForAsOfJoin(): (DataFrame, DataFrame) = {
+val schema1 = StructType(
+  StructField("a", IntegerType, false) ::
+StructField("b", StringType, false) ::
+StructField("left_val", StringType, false) :: Nil)
+val rowSeq1: List[Row] = List(Row(1, "x", "a"), Row(5, "y", "b"), Row(10, 
"z", "c"))
+val df1 = spark.createDataFrame(rowSeq1.asJava, schema1)
+
+val schema2 = StructType(
+  StructField("a", IntegerType) ::
+StructField("b", StringType) ::
+StructField("right_val", IntegerType) :: Nil)
+val rowSeq2: List[Row] = List(Row(1, "v", 1), Row(2, "w", 2), Row(3, "x", 
3),
+  Row(6, "y", 6), Row(7, "z", 7))
+val df2 = spark.createDataFrame(rowSeq2.asJava, schema2)
+
+(df1, df2)
+  }
+
+  test("as-of join - simple") {
+val (df1, df2) = prepareForAsOfJoin()
+checkAnswer(
+  df1.joinAsOf(
+df2, df1.col("a"), df2.col("a"), usingColumns = Seq.empty,
+joinType = "left", tolerance = null, allowExactMatches = true, 
direction = "backward"),
+  Seq(
+Row(1, "x", "a", 1, "v", 1),
+Row(5, "y", "b", 3, "x", 3),
+Row(10, "z", "c", 7, "z", 7)
+  )
+)
+  }
+
+  test("as-of join - usingColumns") {
+val (df1, df2) = prepareForAsOfJoin()
+checkAnswer(
+  df1.joinAsOf(df2, df1.col("a"), df2.col("a"), usingColumns = Seq("b"),
+joinType = "left", tolerance = null, allowExactMatches = true, 
direction = "backward"),
+  Seq(
+Row(1, "x", "a", 

[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925479253


   **[Test build #143528 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143528/testReport)**
 for PR 34033 at commit 
[`174ac71`](https://github.com/apache/spark/commit/174ac717066e5fce2dcb5c0cd50c8d9149fe5580).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] daugraph commented on a change in pull request #34046: [SPARK-36804][YARN] Using the verbose parameter in yarn mode would cause application submission failure

2021-09-22 Thread GitBox


daugraph commented on a change in pull request #34046:
URL: https://github.com/apache/spark/pull/34046#discussion_r714445692



##
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala
##
@@ -66,6 +74,15 @@ private[spark] class ClientArguments(args: Array[String]) {
   throw new IllegalArgumentException("Cannot have primary-py-file and 
primary-r-file" +
 " at the same time")
 }
+
+if (verbose) {
+  logInfo("Client arguments for YARN application:")

Review comment:
   Thanks for your review, your are right, but this is two different 
--verbose option. SparkSubmit will pass --verbose option to 
org.apache.spark.deploy.yarn.ClientArgument.scala, which can't handle the 
--verbose by now. I will add some detailed description of screenshots later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #34051: [SPARK-36809][SQL] Remove broadcast for InSubqueryExec used in DPP

2021-09-22 Thread GitBox


HyukjinKwon commented on pull request #34051:
URL: https://github.com/apache/spark/pull/34051#issuecomment-925478747


   Otherwise, the change looks making sense to me 2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #34073: [SPARK-36760][SQL][FOLLOWUP] Add interface SupportsPushDownV2Filters

2021-09-22 Thread GitBox


cloud-fan commented on a change in pull request #34073:
URL: https://github.com/apache/spark/pull/34073#discussion_r714445044



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownV2Filters.java
##
@@ -22,23 +22,26 @@
 
 /**
  * A mix-in interface for {@link ScanBuilder}. Data sources can implement this 
interface to
- * push down filters to the data source and reduce the size of the data to be 
read.

Review comment:
   Let's only change the classdoc
   ```
   push down V2 {@link Filter}s to ...
   
   Note that, this interface is preferred over {@link SupportsPushDownFilters}, 
which uses V1 Filter and is less
   efficient due to the internal -> external data conversion.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34073: [SPARK-36760][SQL][FOLLOWUP] Add interface SupportsPushDownV2Filters

2021-09-22 Thread GitBox


AmplabJenkins removed a comment on pull request #34073:
URL: https://github.com/apache/spark/pull/34073#issuecomment-925477621


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143523/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34073: [SPARK-36760][SQL][FOLLOWUP] Add interface SupportsPushDownV2Filters

2021-09-22 Thread GitBox


AmplabJenkins commented on pull request #34073:
URL: https://github.com/apache/spark/pull/34073#issuecomment-925477621


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143523/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


AngersZh commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925477420


   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


AmplabJenkins removed a comment on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925477202


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143527/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


SparkQA removed a comment on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925475024


   **[Test build #143527 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143527/testReport)**
 for PR 34033 at commit 
[`87df7b0`](https://github.com/apache/spark/commit/87df7b0af3b6e3e7d8b55d9c30891bca4202a862).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


AmplabJenkins commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925477202


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143527/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925477170


   **[Test build #143527 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143527/testReport)**
 for PR 34033 at commit 
[`87df7b0`](https://github.com/apache/spark/commit/87df7b0af3b6e3e7d8b55d9c30891bca4202a862).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #34073: [SPARK-36760][SQL][FOLLOWUP] Add interface SupportsPushDownV2Filters

2021-09-22 Thread GitBox


SparkQA removed a comment on pull request #34073:
URL: https://github.com/apache/spark/pull/34073#issuecomment-925377369


   **[Test build #143523 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143523/testReport)**
 for PR 34073 at commit 
[`1014995`](https://github.com/apache/spark/commit/1014995820aa9871ed9ac823775dda41d5024299).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34073: [SPARK-36760][SQL][FOLLOWUP] Add interface SupportsPushDownV2Filters

2021-09-22 Thread GitBox


SparkQA commented on pull request #34073:
URL: https://github.com/apache/spark/pull/34073#issuecomment-925476776


   **[Test build #143523 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143523/testReport)**
 for PR 34073 at commit 
[`1014995`](https://github.com/apache/spark/commit/1014995820aa9871ed9ac823775dda41d5024299).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34051: [SPARK-36809][SQL] Remove broadcast for InSubqueryExec used in DPP

2021-09-22 Thread GitBox


HyukjinKwon commented on a change in pull request #34051:
URL: https://github.com/apache/spark/pull/34051#discussion_r714443615



##
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala
##
@@ -157,7 +161,8 @@ case class InSubqueryExec(
   child = child.canonicalized,
   plan = plan.canonicalized.asInstanceOf[BaseSubqueryExec],
   exprId = ExprId(0),
-  resultBroadcast = null)
+  resultBroadcast = null,
+  result = null)

Review comment:
   hm, IIRC when it copies, it won't copy `@transient private var result: 
Array[Any] = _` .. I think we won't have to move `result` into the constructor 
(?).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-22 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-925475024


   **[Test build #143527 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143527/testReport)**
 for PR 34033 at commit 
[`87df7b0`](https://github.com/apache/spark/commit/87df7b0af3b6e3e7d8b55d9c30891bca4202a862).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34051: [SPARK-36809][SQL] Remove broadcast for InSubqueryExec used in DPP

2021-09-22 Thread GitBox


HyukjinKwon commented on a change in pull request #34051:
URL: https://github.com/apache/spark/pull/34051#discussion_r714442193



##
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala
##
@@ -104,17 +104,18 @@ case class ScalarSubquery(
 }
 
 /**
- * The physical node of in-subquery. This is for Dynamic Partition Pruning 
only, as in-subquery
- * coming from the original query will always be converted to joins.
+ * The physical node of in-subquery. When this is used for Dynamic Partition 
Pruning, as the pruning
+ * happens at the driver side, we don't broadcast subquery result.
  */
 case class InSubqueryExec(
 child: Expression,
 plan: BaseSubqueryExec,
 exprId: ExprId,
-private var resultBroadcast: Broadcast[Array[Any]] = null)
+needBroadcast: Boolean = false,
+private var resultBroadcast: Broadcast[Array[Any]] = null,
+@transient private var result: Array[Any] = null)

Review comment:
   qq: why should we move this to constructor?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >