[GitHub] [spark] AmplabJenkins removed a comment on pull request #28387: [SPARK-29339][R][FOLLOW-UP] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28387:
URL: https://github.com/apache/spark/pull/28387#issuecomment-620488750







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wang-zhun commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-04-28 Thread GitBox


wang-zhun commented on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-620488942


   @tgravescs help look at this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28389: [SPARK-31592]bufferPoolsBySize in HeapMemoryAllocator should be thread safe

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28389:
URL: https://github.com/apache/spark/pull/28389#issuecomment-620488475


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28387: [SPARK-29339][R][FOLLOW-UP] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28387:
URL: https://github.com/apache/spark/pull/28387#issuecomment-620488750







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] fanyunbojerry opened a new pull request #28389: [SPARK-31592]bufferPoolsBySize in HeapMemoryAllocator should be thread safe

2020-04-28 Thread GitBox


fanyunbojerry opened a new pull request #28389:
URL: https://github.com/apache/spark/pull/28389


   
   
   
   
   
   ### What changes were proposed in this pull request?
   Currently, bufferPoolsBySize in HeapMemoryAllocator uses a Map type whose 
value type is LinkedList.
   LinkedList is not thread safe and may hit the error below
   ```
   java.util.NoSuchElementExceptionException
   at java.util.LinkedList.removeFirst(LinkedList.java:270)
   at java.util.LinkedList.remove(LinkedList.java:685)
   at 
org.apache.spark.unsafe.memory.HeapMemoryAllocator.allocate(HeapMemoryAllocator.java:57)
   ```
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28387: [SPARK-29339][R][FOLLOW-UP] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


SparkQA commented on pull request #28387:
URL: https://github.com/apache/spark/pull/28387#issuecomment-620488228


   **[Test build #121985 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121985/testReport)**
 for PR 28387 at commit 
[`0a7ec3b`](https://github.com/apache/spark/commit/0a7ec3beba0b61e8aebd443e6cc357b1320816f2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MichaelChirico commented on a change in pull request #28387: [SPARK-29339][R][FOLLOW-UP] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


MichaelChirico commented on a change in pull request #28387:
URL: https://github.com/apache/spark/pull/28387#discussion_r416463288



##
File path: R/pkg/R/types.R
##
@@ -88,11 +88,6 @@ specialtypeshandle <- function(type) {
 checkSchemaInArrow <- function(schema) {
   stopifnot(inherits(schema, "structType"))
 
-  requireNamespace1 <- requireNamespace

Review comment:
   In the meantime, given that 
`sparkR.conf("spark.sql.execution.arrow.sparkr.enabled")[[1]] == "true"` was 
popping up in a few places, I added a helper to `utils.R` to centralize 
maintenance of that snippet.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28387: [SPARK-29339][R][FOLLOW-UP] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28387:
URL: https://github.com/apache/spark/pull/28387#issuecomment-620485227







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28387: [SPARK-29339][R][FOLLOW-UP] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28387:
URL: https://github.com/apache/spark/pull/28387#issuecomment-620485227







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28387: [SPARK-29339][R][FOLLOW-UP] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


SparkQA commented on pull request #28387:
URL: https://github.com/apache/spark/pull/28387#issuecomment-620484508


   **[Test build #121984 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121984/testReport)**
 for PR 28387 at commit 
[`043dafc`](https://github.com/apache/spark/commit/043dafcf78092f6d481c7bd09fc2e20e0a2d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #28222: SPARK-31447 Fix issue in ExtractIntervalPart expression

2020-04-28 Thread GitBox


yaooqinn commented on pull request #28222:
URL: https://github.com/apache/spark/pull/28222#issuecomment-620484345


   Checked PostgresSQL(not ANSI interval type) and presto(ANSI), both of them 
return proper days
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MichaelChirico commented on a change in pull request #28387: [SPARK-29339][R][FOLLOW-UP] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


MichaelChirico commented on a change in pull request #28387:
URL: https://github.com/apache/spark/pull/28387#discussion_r416459279



##
File path: R/pkg/R/types.R
##
@@ -88,11 +88,6 @@ specialtypeshandle <- function(type) {
 checkSchemaInArrow <- function(schema) {
   stopifnot(inherits(schema, "structType"))
 
-  requireNamespace1 <- requireNamespace

Review comment:
   I checked all the usages of `checkSchemaInArrow`:
   
   ```
   grep -Fnr "checkSchemaInArrow" R
   R/types.R:88:checkSchemaInArrow <- function(schema) {
   R/SQLContext.R:277:checkSchemaInArrow(schema)
   R/group.R:235:  checkSchemaInArrow(schema)
   R/DataFrame.R:1211:checkSchemaInArrow(schema(x))
   R/DataFrame.R:1509:  checkSchemaInArrow(schema)
   ```
   
   These are all within branches that have checked
   
   ```
   arrowEnabled <- sparkR.conf("spark.sql.execution.arrow.sparkr.enabled")[[1]] 
== "true"
   ```
   
   Since `arrow` is not directly used and this conf check is passed already, I 
think the `requireNamespace` here is unnecessary.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MichaelChirico commented on a change in pull request #28387: [SPARK-29339][R][FOLLOW-UP] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


MichaelChirico commented on a change in pull request #28387:
URL: https://github.com/apache/spark/pull/28387#discussion_r416457657



##
File path: R/pkg/R/DataFrame.R
##
@@ -1226,8 +1226,7 @@ setMethod("collect",
   # empty data.frame with 0 columns and 0 rows
   data.frame()
 } else if (useArrow) {
-  requireNamespace1 <- requireNamespace
-  if (requireNamespace1("arrow", quietly = TRUE)) {
+  if (requireNamespace("arrow", quietly = TRUE)) {
 read_arrow <- get("read_arrow", envir = asNamespace("arrow"), 
inherits = FALSE)

Review comment:
   Yep, I wasn't reading carefully enough.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620481044







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620481044







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620480549


   **[Test build #121983 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121983/testReport)**
 for PR 28386 at commit 
[`7f83232`](https://github.com/apache/spark/commit/7f83232b14529901df19ae23743380768a8daca5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28375: [SPARK-30282][SQL][FOLLOWUP] SHOW TBLPROPERTIES should support views

2020-04-28 Thread GitBox


cloud-fan commented on a change in pull request #28375:
URL: https://github.com/apache/spark/pull/28375#discussion_r416448092



##
File path: docs/sql-migration-guide.md
##
@@ -59,7 +59,7 @@ license: |
  
   - In Spark 3.0, you can use `ADD FILE` to add file directories as well. 
Earlier you could add only single files using this command. To restore the 
behavior of earlier versions, set `spark.sql.legacy.addSingleFileInAddFile` to 
`true`.
 
-  - In Spark 3.0, `SHOW TBLPROPERTIES` throws `AnalysisException` if the table 
does not exist. In Spark version 2.4 and below, this scenario caused 
`NoSuchTableException`. Also, `SHOW TBLPROPERTIES` on a temporary view causes 
`AnalysisException`. In Spark version 2.4 and below, it returned an empty 
result.
+  - In Spark 3.0, `SHOW TBLPROPERTIES` throws `AnalysisException` if the table 
does not exist. In Spark version 2.4 and below, this scenario caused 
`NoSuchTableException`.

Review comment:
   I don't think error message/exception type change is a breaking change. 
@dongjoon-hyun what do you think?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28222: SPARK-31447 Fix issue in ExtractIntervalPart expression

2020-04-28 Thread GitBox


cloud-fan commented on pull request #28222:
URL: https://github.com/apache/spark/pull/28222#issuecomment-620473740


   cc @yaooqinn can you take a look? This seems like a hard problem as we have 
a non-standard interval definition. It's interesting to see what results other 
systems return, like presto, hive, snowflake, etc.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28385: [SPARK-31591][CORE] Fix null name prefix when create directory

2020-04-28 Thread GitBox


maropu commented on pull request #28385:
URL: https://github.com/apache/spark/pull/28385#issuecomment-620465500


   Why is the method called with null in the shuffle case? Is this an issue in 
callsite?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28385: [SPARK-31591][CORE] Fix null name prefix when create directory

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28385:
URL: https://github.com/apache/spark/pull/28385#issuecomment-620465474







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28385: [SPARK-31591][CORE] Fix null name prefix when create directory

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28385:
URL: https://github.com/apache/spark/pull/28385#issuecomment-620465474







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28385: [SPARK-31591][CORE] Fix null name prefix when create directory

2020-04-28 Thread GitBox


SparkQA commented on pull request #28385:
URL: https://github.com/apache/spark/pull/28385#issuecomment-620464831


   **[Test build #121982 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121982/testReport)**
 for PR 28385 at commit 
[`a42f005`](https://github.com/apache/spark/commit/a42f005b8942a4e7d05f8271bf990b46e916c790).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28385: [SPARK-31591][CORE] Fix null name prefix when create directory

2020-04-28 Thread GitBox


maropu commented on pull request #28385:
URL: https://github.com/apache/spark/pull/28385#issuecomment-620464218


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28387: [SPARK-29339][R][FOLLOW-UP] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


HyukjinKwon commented on a change in pull request #28387:
URL: https://github.com/apache/spark/pull/28387#discussion_r416431223



##
File path: R/pkg/R/DataFrame.R
##
@@ -1226,8 +1226,7 @@ setMethod("collect",
   # empty data.frame with 0 columns and 0 rows
   data.frame()
 } else if (useArrow) {
-  requireNamespace1 <- requireNamespace
-  if (requireNamespace1("arrow", quietly = TRUE)) {
+  if (requireNamespace("arrow", quietly = TRUE)) {
 read_arrow <- get("read_arrow", envir = asNamespace("arrow"), 
inherits = FALSE)

Review comment:
   @MichaelChirico I believe this can be fixed as `arrow::read_arrow`. It 
was also a workaround. see also https://github.com/apache/spark/pull/25993





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28387: [SPARK-29339][R][FOLLOW-UP] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


HyukjinKwon commented on a change in pull request #28387:
URL: https://github.com/apache/spark/pull/28387#discussion_r416429707



##
File path: R/pkg/R/DataFrame.R
##
@@ -1226,8 +1226,7 @@ setMethod("collect",
   # empty data.frame with 0 columns and 0 rows
   data.frame()
 } else if (useArrow) {
-  requireNamespace1 <- requireNamespace
-  if (requireNamespace1("arrow", quietly = TRUE)) {
+  if (requireNamespace("arrow", quietly = TRUE)) {

Review comment:
   Thanks, I wonder why I missed this ..





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] igreenfield commented on pull request #26624: [SPARK-8981][core] Add MDC support in Executor

2020-04-28 Thread GitBox


igreenfield commented on pull request #26624:
URL: https://github.com/apache/spark/pull/26624#issuecomment-620460159


   The failed test does not seems to be connected to the changes in the code 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28367: [SPARK-31573][R] Apply fixed=TRUE as appropriate to regex usage in R

2020-04-28 Thread GitBox


HyukjinKwon commented on pull request #28367:
URL: https://github.com/apache/spark/pull/28367#issuecomment-620458462


   Merged to master and branch-3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox


cloud-fan commented on a change in pull request #26141:
URL: https://github.com/apache/spark/pull/26141#discussion_r416423730



##
File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
##
@@ -72,6 +72,48 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest 
{
 try f(client) finally transport.close()
   }
 
+  test("SPARK-29492: use add jar in sync mode") {

Review comment:
   can you post the error message when running the test before this patch?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox


cloud-fan commented on a change in pull request #26141:
URL: https://github.com/apache/spark/pull/26141#discussion_r416423419



##
File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
##
@@ -72,6 +72,48 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest 
{
 try f(client) finally transport.close()
   }
 
+  test("SPARK-29492: use add jar in sync mode") {

Review comment:
   let's put the new test at the end.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


HyukjinKwon commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620454937


   cc @felixcheung and @shivaram FYI



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620453870


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121980/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620453844


   **[Test build #121980 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121980/testReport)**
 for PR 28386 at commit 
[`d0965d5`](https://github.com/apache/spark/commit/d0965d5de4288c5ad83337ca19577ebf10195dc3).
* This patch **fails R style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620449564


   **[Test build #121980 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121980/testReport)**
 for PR 28386 at commit 
[`d0965d5`](https://github.com/apache/spark/commit/d0965d5de4288c5ad83337ca19577ebf10195dc3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620453862


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620453862







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on a change in pull request #28326: [SPARK-27340][SS] Alias on TimeWindow expression cause watermark metadata lost

2020-04-28 Thread GitBox


xuanyuanking commented on a change in pull request #28326:
URL: https://github.com/apache/spark/pull/28326#discussion_r416419173



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Column.scala
##
@@ -1040,17 +1034,11 @@ class Column(val expr: Expression) extends Logging {
*   df.select($"colA".name("colB"))
* }}}
*
-   * If the current column has metadata associated with it, this metadata will 
be propagated
-   * to the new column.  If this not desired, use `as` with explicitly empty 
metadata.

Review comment:
   These comments added together with the changes we just reverted. But 
it's good to have clear comments, I'll rephrase and add them back in the 
follow-up.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28294: [SPARK-31519][SQL] Cast in having aggregate expressions returns the wrong result

2020-04-28 Thread GitBox


cloud-fan commented on pull request #28294:
URL: https://github.com/apache/spark/pull/28294#issuecomment-620452258


   @xuanyuanking can you send a backport PR for 2.4?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28294: [SPARK-31519][SQL] Cast in having aggregate expressions returns the wrong result

2020-04-28 Thread GitBox


cloud-fan commented on pull request #28294:
URL: https://github.com/apache/spark/pull/28294#issuecomment-620451860


   thanks, merging to master/3.0!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28366: [WIP][SPARK-31365][SQL] Enable nested predicate pushdown per data sources

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28366:
URL: https://github.com/apache/spark/pull/28366#issuecomment-620450142







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620450040







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620450040







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28366: [WIP][SPARK-31365][SQL] Enable nested predicate pushdown per data sources

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28366:
URL: https://github.com/apache/spark/pull/28366#issuecomment-620450142







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28366: [WIP][SPARK-31365][SQL] Enable nested predicate pushdown per data sources

2020-04-28 Thread GitBox


SparkQA commented on pull request #28366:
URL: https://github.com/apache/spark/pull/28366#issuecomment-620449590


   **[Test build #121981 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121981/testReport)**
 for PR 28366 at commit 
[`e555a1c`](https://github.com/apache/spark/commit/e555a1c94d6ec7b1a338015a686af63eaec3c8a9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620449564


   **[Test build #121980 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121980/testReport)**
 for PR 28386 at commit 
[`d0965d5`](https://github.com/apache/spark/commit/d0965d5de4288c5ad83337ca19577ebf10195dc3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620447533


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121978/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620447524


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620442254


   **[Test build #121978 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121978/testReport)**
 for PR 28386 at commit 
[`abc9bd6`](https://github.com/apache/spark/commit/abc9bd6a1f02796be4940aac228c76f96cd9b49a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620447524







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620447503


   **[Test build #121978 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121978/testReport)**
 for PR 28386 at commit 
[`abc9bd6`](https://github.com/apache/spark/commit/abc9bd6a1f02796be4940aac228c76f96cd9b49a).
* This patch **fails R style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27978: [SPARK-31127][ML] Implement abstract Selector

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #27978:
URL: https://github.com/apache/spark/pull/27978#issuecomment-620446584







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620442764







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27978: [SPARK-31127][ML] Implement abstract Selector

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #27978:
URL: https://github.com/apache/spark/pull/27978#issuecomment-620446584







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27978: [SPARK-31127][ML] Implement abstract Selector

2020-04-28 Thread GitBox


SparkQA commented on pull request #27978:
URL: https://github.com/apache/spark/pull/27978#issuecomment-620445888


   **[Test build #121979 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121979/testReport)**
 for PR 27978 at commit 
[`5eabb62`](https://github.com/apache/spark/commit/5eabb625e452236eff344131345deb2e8aaa8e7b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #27627: [WIP][SPARK-28067][SQL] Fix incorrect results for decimal aggregate sum by returning null on decimal overflow

2020-04-28 Thread GitBox


cloud-fan commented on a change in pull request #27627:
URL: https://github.com/apache/spark/pull/27627#discussion_r416408564



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala
##
@@ -62,38 +62,113 @@ case class Sum(child: Expression) extends 
DeclarativeAggregate with ImplicitCast
 
   private lazy val sum = AttributeReference("sum", sumDataType)()
 
+  private lazy val isEmptyOrNulls = AttributeReference("isEmptyOrNulls", 
BooleanType, false)()
+
   private lazy val zero = Literal.default(sumDataType)
 
-  override lazy val aggBufferAttributes = sum :: Nil
+  override lazy val aggBufferAttributes = sum :: isEmptyOrNulls :: Nil
 
   override lazy val initialValues: Seq[Expression] = Seq(
-/* sum = */ Literal.create(null, sumDataType)
+/* sum = */  zero,
+/* isEmptyOrNulls = */ Literal.create(true, BooleanType)
   )
 
+  /**
+   * For decimal types and when child is nullable:
+   * isEmptyOrNulls flag is a boolean to represent if there are no rows or if 
all rows that
+   * have been seen are null.  This will be used to identify if the end result 
of sum in
+   * evaluateExpression should be null or not.
+   *
+   * Update of the isEmptyOrNulls flag:
+   * If this flag is false, then keep it as is.
+   * If this flag is true, then check if the incoming value is null and if it 
is null, keep it
+   * as true else update it to false.
+   * Once this flag is switched to false, it will remain false.
+   *
+   * The update of the sum is as follows:
+   * If sum is null, then we have a case of overflow, so keep sum as is.
+   * If sum is not null, and the incoming value is not null, then perform the 
addition along
+   * with the overflow checking. Note, that if overflow occurs, then sum will 
be null here.
+   * If the new incoming value is null, we will keep the sum in buffer as is 
and skip this
+   * incoming null
+   */
   override lazy val updateExpressions: Seq[Expression] = {
 if (child.nullable) {
-  Seq(
-/* sum = */
-coalesce(coalesce(sum, zero) + child.cast(sumDataType), sum)
-  )
+  resultType match {
+case d: DecimalType =>
+  Seq(
+/* sum */
+If(IsNull(sum), sum,
+  If(IsNotNull(child.cast(sumDataType)),
+CheckOverflow(sum + child.cast(sumDataType), d, true), sum)),
+/* isEmptyOrNulls */
+If(isEmptyOrNulls, IsNull(child.cast(sumDataType)), isEmptyOrNulls)
+  )
+case _ =>
+  Seq(
+coalesce(sum + child.cast(sumDataType), sum),
+If(isEmptyOrNulls, IsNull(child.cast(sumDataType)), isEmptyOrNulls)
+  )
+  }
 } else {
-  Seq(
-/* sum = */
-coalesce(sum, zero) + child.cast(sumDataType)
-  )
+  resultType match {
+case d: DecimalType =>
+  Seq(
+/* sum */
+If(IsNull(sum), sum, CheckOverflow(sum + child.cast(sumDataType), 
d, true)),
+/* isEmptyOrNulls */
+false
+  )
+case _ => Seq(sum + child.cast(sumDataType), false)
+  }
 }
   }
 
+  /**
+   * For decimal type:
+   * update of the sum is as follows:
+   * Check if either portion of the left.sum or right.sum has overflowed
+   * If it has, then the sum value will remain null.
+   * If it did not have overflow, then add the sum.left and sum.right and 
check for overflow.
+   *
+   * isEmptyOrNulls:  Set to false if either one of the left or right is set 
to false. This
+   * means we have seen atleast a row that was not null.
+   * If the value from bufferLeft and bufferRight are both true, then this 
will be true.
+   */
   override lazy val mergeExpressions: Seq[Expression] = {
-Seq(
-  /* sum = */
-  coalesce(coalesce(sum.left, zero) + sum.right, sum.left)
-)
+resultType match {
+  case d: DecimalType =>
+Seq(
+  /* sum = */
+  If(And(IsNull(sum.left), EqualTo(isEmptyOrNulls.left, false)) ||
+And(IsNull(sum.right), EqualTo(isEmptyOrNulls.right, false)),
+  Literal.create(null, resultType),
+  CheckOverflow(sum.left + sum.right, d, true)),
+  /* isEmptyOrNulls = */
+  And(isEmptyOrNulls.left, isEmptyOrNulls.right)
+  )
+  case _ =>
+Seq(
+  coalesce(sum.left + sum.right, sum.left),
+  And(isEmptyOrNulls.left, isEmptyOrNulls.right)
+)
+}
   }
 
+  /**
+   * If the isEmptyOrNulls is true, then it means either there are no rows, or 
all the rows were
+   * null, so the result will be null.
+   * If the isEmptyOrNulls is false, then if sum is null that means an 
overflow has happened.
+   * So now, if ansi is enabled, then throw exception, if not then return null.
+   * If sum is not null, then return the sum.

Review comment:
   If we don't check overflow at 
https://github.com/apache/spark/pull/27627/files#r416407527 , we 

[GitHub] [spark] cloud-fan commented on a change in pull request #27627: [WIP][SPARK-28067][SQL] Fix incorrect results for decimal aggregate sum by returning null on decimal overflow

2020-04-28 Thread GitBox


cloud-fan commented on a change in pull request #27627:
URL: https://github.com/apache/spark/pull/27627#discussion_r416407527



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala
##
@@ -62,38 +62,113 @@ case class Sum(child: Expression) extends 
DeclarativeAggregate with ImplicitCast
 
   private lazy val sum = AttributeReference("sum", sumDataType)()
 
+  private lazy val isEmptyOrNulls = AttributeReference("isEmptyOrNulls", 
BooleanType, false)()
+
   private lazy val zero = Literal.default(sumDataType)
 
-  override lazy val aggBufferAttributes = sum :: Nil
+  override lazy val aggBufferAttributes = sum :: isEmptyOrNulls :: Nil
 
   override lazy val initialValues: Seq[Expression] = Seq(
-/* sum = */ Literal.create(null, sumDataType)
+/* sum = */  zero,
+/* isEmptyOrNulls = */ Literal.create(true, BooleanType)
   )
 
+  /**
+   * For decimal types and when child is nullable:
+   * isEmptyOrNulls flag is a boolean to represent if there are no rows or if 
all rows that
+   * have been seen are null.  This will be used to identify if the end result 
of sum in
+   * evaluateExpression should be null or not.
+   *
+   * Update of the isEmptyOrNulls flag:
+   * If this flag is false, then keep it as is.
+   * If this flag is true, then check if the incoming value is null and if it 
is null, keep it
+   * as true else update it to false.
+   * Once this flag is switched to false, it will remain false.
+   *
+   * The update of the sum is as follows:
+   * If sum is null, then we have a case of overflow, so keep sum as is.
+   * If sum is not null, and the incoming value is not null, then perform the 
addition along
+   * with the overflow checking. Note, that if overflow occurs, then sum will 
be null here.
+   * If the new incoming value is null, we will keep the sum in buffer as is 
and skip this
+   * incoming null
+   */
   override lazy val updateExpressions: Seq[Expression] = {
 if (child.nullable) {
-  Seq(
-/* sum = */
-coalesce(coalesce(sum, zero) + child.cast(sumDataType), sum)
-  )
+  resultType match {
+case d: DecimalType =>
+  Seq(
+/* sum */
+If(IsNull(sum), sum,
+  If(IsNotNull(child.cast(sumDataType)),
+CheckOverflow(sum + child.cast(sumDataType), d, true), sum)),
+/* isEmptyOrNulls */
+If(isEmptyOrNulls, IsNull(child.cast(sumDataType)), isEmptyOrNulls)
+  )
+case _ =>
+  Seq(
+coalesce(sum + child.cast(sumDataType), sum),
+If(isEmptyOrNulls, IsNull(child.cast(sumDataType)), isEmptyOrNulls)
+  )
+  }
 } else {
-  Seq(
-/* sum = */
-coalesce(sum, zero) + child.cast(sumDataType)
-  )
+  resultType match {
+case d: DecimalType =>
+  Seq(
+/* sum */
+If(IsNull(sum), sum, CheckOverflow(sum + child.cast(sumDataType), 
d, true)),
+/* isEmptyOrNulls */
+false
+  )
+case _ => Seq(sum + child.cast(sumDataType), false)
+  }
 }
   }
 
+  /**
+   * For decimal type:
+   * update of the sum is as follows:
+   * Check if either portion of the left.sum or right.sum has overflowed
+   * If it has, then the sum value will remain null.
+   * If it did not have overflow, then add the sum.left and sum.right and 
check for overflow.

Review comment:
   We don't need to check overflow here. We can do it at the end.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))

2020-04-28 Thread GitBox


dongjoon-hyun commented on pull request #27803:
URL: https://github.com/apache/spark/pull/27803#issuecomment-620444208


   Thank you, @maropu .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #27627: [WIP][SPARK-28067][SQL] Fix incorrect results for decimal aggregate sum by returning null on decimal overflow

2020-04-28 Thread GitBox


cloud-fan commented on a change in pull request #27627:
URL: https://github.com/apache/spark/pull/27627#discussion_r416407135



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala
##
@@ -62,38 +62,113 @@ case class Sum(child: Expression) extends 
DeclarativeAggregate with ImplicitCast
 
   private lazy val sum = AttributeReference("sum", sumDataType)()
 
+  private lazy val isEmptyOrNulls = AttributeReference("isEmptyOrNulls", 
BooleanType, false)()
+
   private lazy val zero = Literal.default(sumDataType)
 
-  override lazy val aggBufferAttributes = sum :: Nil
+  override lazy val aggBufferAttributes = sum :: isEmptyOrNulls :: Nil
 
   override lazy val initialValues: Seq[Expression] = Seq(
-/* sum = */ Literal.create(null, sumDataType)
+/* sum = */  zero,
+/* isEmptyOrNulls = */ Literal.create(true, BooleanType)
   )
 
+  /**
+   * For decimal types and when child is nullable:
+   * isEmptyOrNulls flag is a boolean to represent if there are no rows or if 
all rows that
+   * have been seen are null.  This will be used to identify if the end result 
of sum in
+   * evaluateExpression should be null or not.
+   *
+   * Update of the isEmptyOrNulls flag:
+   * If this flag is false, then keep it as is.
+   * If this flag is true, then check if the incoming value is null and if it 
is null, keep it
+   * as true else update it to false.
+   * Once this flag is switched to false, it will remain false.
+   *
+   * The update of the sum is as follows:
+   * If sum is null, then we have a case of overflow, so keep sum as is.
+   * If sum is not null, and the incoming value is not null, then perform the 
addition along
+   * with the overflow checking. Note, that if overflow occurs, then sum will 
be null here.
+   * If the new incoming value is null, we will keep the sum in buffer as is 
and skip this
+   * incoming null
+   */
   override lazy val updateExpressions: Seq[Expression] = {
 if (child.nullable) {
-  Seq(
-/* sum = */
-coalesce(coalesce(sum, zero) + child.cast(sumDataType), sum)
-  )
+  resultType match {
+case d: DecimalType =>
+  Seq(
+/* sum */
+If(IsNull(sum), sum,
+  If(IsNotNull(child.cast(sumDataType)),
+CheckOverflow(sum + child.cast(sumDataType), d, true), sum)),
+/* isEmptyOrNulls */
+If(isEmptyOrNulls, IsNull(child.cast(sumDataType)), isEmptyOrNulls)
+  )
+case _ =>
+  Seq(
+coalesce(sum + child.cast(sumDataType), sum),
+If(isEmptyOrNulls, IsNull(child.cast(sumDataType)), isEmptyOrNulls)
+  )
+  }
 } else {
-  Seq(
-/* sum = */
-coalesce(sum, zero) + child.cast(sumDataType)
-  )
+  resultType match {
+case d: DecimalType =>
+  Seq(
+/* sum */
+If(IsNull(sum), sum, CheckOverflow(sum + child.cast(sumDataType), 
d, true)),
+/* isEmptyOrNulls */
+false
+  )
+case _ => Seq(sum + child.cast(sumDataType), false)
+  }
 }
   }
 
+  /**
+   * For decimal type:
+   * update of the sum is as follows:
+   * Check if either portion of the left.sum or right.sum has overflowed

Review comment:
   we should explain how we check overflow: the `sum` is null and `isEmpty` 
is false.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620442764







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620442254


   **[Test build #121978 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121978/testReport)**
 for PR 28386 at commit 
[`abc9bd6`](https://github.com/apache/spark/commit/abc9bd6a1f02796be4940aac228c76f96cd9b49a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sathyaprakashg commented on pull request #28222: SPARK-31447 Fix issue in ExtractIntervalPart expression

2020-04-28 Thread GitBox


sathyaprakashg commented on pull request #28222:
URL: https://github.com/apache/spark/pull/28222#issuecomment-620441968


   > @sathyaprakashg Please, take a look at the PRs
   > #26337
   > #27262
   
   Thanks @MaxGekk for prompt reply. CalendarInterval change is not required to 
fix the issue. I can revert the proposed change for CalendarInterval change. 
   
   How does my proposed change for ExtractIntervalPart looks? If it looks good, 
I will update my PR to include only ExtractIntervalPart change
   
   We need to change ExtractIntervalPart so that below query returns output as 
14 instead of 0. Please refer _Why are the changes needed?_ for more information
   
   SELECT EXTRACT(DAY FROM (cast('2020-01-15 00:00:00' as timestamp) - 
cast('2020-01-01 00:00:00' as timestamp)))



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620439117


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121975/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-620439260







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620438619


   **[Test build #121975 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121975/testReport)**
 for PR 28386 at commit 
[`155543e`](https://github.com/apache/spark/commit/155543ee0a1ae91469240fef3f9edb67d0d5a998).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620439110







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620439095


   **[Test build #121975 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121975/testReport)**
 for PR 28386 at commit 
[`155543e`](https://github.com/apache/spark/commit/155543ee0a1ae91469240fef3f9edb67d0d5a998).
* This patch **fails RAT tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-620439260







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620439205







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620439110







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors

2020-04-28 Thread GitBox


SparkQA commented on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-620438687


   **[Test build #121976 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121976/testReport)**
 for PR 28349 at commit 
[`a97a8fc`](https://github.com/apache/spark/commit/a97a8fc0058e73e180cd69ce9f9df5a11c6bc03c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620438619


   **[Test build #121975 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121975/testReport)**
 for PR 28386 at commit 
[`155543e`](https://github.com/apache/spark/commit/155543ee0a1ae91469240fef3f9edb67d0d5a998).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27978: [SPARK-31127][ML] Implement abstract Selector

2020-04-28 Thread GitBox


SparkQA commented on pull request #27978:
URL: https://github.com/apache/spark/pull/27978#issuecomment-620438664


   **[Test build #121977 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121977/testReport)**
 for PR 27978 at commit 
[`e5a19c1`](https://github.com/apache/spark/commit/e5a19c19f970868ddc7a95d4d23fe5fa910b33f1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #27627: [WIP][SPARK-28067][SQL] Fix incorrect results for decimal aggregate sum by returning null on decimal overflow

2020-04-28 Thread GitBox


cloud-fan commented on a change in pull request #27627:
URL: https://github.com/apache/spark/pull/27627#discussion_r416399567



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala
##
@@ -62,38 +62,113 @@ case class Sum(child: Expression) extends 
DeclarativeAggregate with ImplicitCast
 
   private lazy val sum = AttributeReference("sum", sumDataType)()
 
+  private lazy val isEmptyOrNulls = AttributeReference("isEmptyOrNulls", 
BooleanType, false)()
+
   private lazy val zero = Literal.default(sumDataType)
 
-  override lazy val aggBufferAttributes = sum :: Nil
+  override lazy val aggBufferAttributes = sum :: isEmptyOrNulls :: Nil

Review comment:
   we only need to add it to the buffer attributes for decimal type.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #27627: [WIP][SPARK-28067][SQL] Fix incorrect results for decimal aggregate sum by returning null on decimal overflow

2020-04-28 Thread GitBox


cloud-fan commented on a change in pull request #27627:
URL: https://github.com/apache/spark/pull/27627#discussion_r416398914



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala
##
@@ -62,38 +62,113 @@ case class Sum(child: Expression) extends 
DeclarativeAggregate with ImplicitCast
 
   private lazy val sum = AttributeReference("sum", sumDataType)()
 
+  private lazy val isEmptyOrNulls = AttributeReference("isEmptyOrNulls", 
BooleanType, false)()
+
   private lazy val zero = Literal.default(sumDataType)
 
-  override lazy val aggBufferAttributes = sum :: Nil
+  override lazy val aggBufferAttributes = sum :: isEmptyOrNulls :: Nil
 
   override lazy val initialValues: Seq[Expression] = Seq(
-/* sum = */ Literal.create(null, sumDataType)
+/* sum = */  zero,
+/* isEmptyOrNulls = */ Literal.create(true, BooleanType)
   )
 
+  /**
+   * For decimal types and when child is nullable:
+   * isEmptyOrNulls flag is a boolean to represent if there are no rows or if 
all rows that
+   * have been seen are null.  This will be used to identify if the end result 
of sum in
+   * evaluateExpression should be null or not.
+   *
+   * Update of the isEmptyOrNulls flag:
+   * If this flag is false, then keep it as is.
+   * If this flag is true, then check if the incoming value is null and if it 
is null, keep it
+   * as true else update it to false.
+   * Once this flag is switched to false, it will remain false.
+   *
+   * The update of the sum is as follows:
+   * If sum is null, then we have a case of overflow, so keep sum as is.
+   * If sum is not null, and the incoming value is not null, then perform the 
addition along
+   * with the overflow checking. Note, that if overflow occurs, then sum will 
be null here.

Review comment:
   Is it really necessary? We can let it overflow, and it will become null 
when we write it out to shuffle files.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #28376: [SPARK-31582] [Yarn] Being able to not populate Hadoop classpath

2020-04-28 Thread GitBox


viirya commented on pull request #28376:
URL: https://github.com/apache/spark/pull/28376#issuecomment-620437448


   One question, for the case mentioned in the description, "One case we have 
is when a user uses an Apache Spark distribution with its-own embedded hadoop, 
and submits a job to Cloudera or Hortonworks Yarn clusters", since the embedded 
hadoop is incompatible to the cluster, is it generally okay to submit the app 
there? Even you don't populate the classpath, will RPC or protocol be a problem?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27978: [SPARK-31127][ML] Implement abstract Selector

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #27978:
URL: https://github.com/apache/spark/pull/27978#issuecomment-620435924







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyE

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #26339:
URL: https://github.com/apache/spark/pull/26339#issuecomment-620435969







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27978: [SPARK-31127][ML] Implement abstract Selector

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #27978:
URL: https://github.com/apache/spark/pull/27978#issuecomment-620435924







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsExc

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #26339:
URL: https://github.com/apache/spark/pull/26339#issuecomment-620435969







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsException

2020-04-28 Thread GitBox


SparkQA commented on pull request #26339:
URL: https://github.com/apache/spark/pull/26339#issuecomment-620435452


   **[Test build #121974 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121974/testReport)**
 for PR 26339 at commit 
[`7211e27`](https://github.com/apache/spark/commit/7211e27be13c6c99c18dcf036bedf09dd7340bd1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #28222: SPARK-31447 Fix issue in ExtractIntervalPart expression

2020-04-28 Thread GitBox


MaxGekk commented on pull request #28222:
URL: https://github.com/apache/spark/pull/28222#issuecomment-620432117


   @sathyaprakashg Please, take a look at the PRs
   https://github.com/apache/spark/pull/26337
   https://github.com/apache/spark/pull/27262



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620429261


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121971/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28381: [SPARK-31586][SQL] Replace expression TimeSub(l, r) with TimeAdd(l -r)

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28381:
URL: https://github.com/apache/spark/pull/28381#issuecomment-620429356







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28387: [SPARK-26924][R] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28387:
URL: https://github.com/apache/spark/pull/28387#issuecomment-620429212







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620429239


   **[Test build #121971 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121971/testReport)**
 for PR 28386 at commit 
[`e155c0d`](https://github.com/apache/spark/commit/e155c0d631f61b6d84d3f44040ae07d7ff55ec54).
* This patch **fails RAT tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620429256







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-620429379







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28388: [SPARK-31553][SQL] Revert "[SPARK-29048] Improve performance on Column.isInCollection() with a large size collection"

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28388:
URL: https://github.com/apache/spark/pull/28388#issuecomment-620429270







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28381: [SPARK-31586][SQL] Replace expression TimeSub(l, r) with TimeAdd(l -r)

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28381:
URL: https://github.com/apache/spark/pull/28381#issuecomment-620429356







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620428768


   **[Test build #121971 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121971/testReport)**
 for PR 28386 at commit 
[`e155c0d`](https://github.com/apache/spark/commit/e155c0d631f61b6d84d3f44040ae07d7ff55ec54).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-620429379







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28387: [SPARK-26924][R] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28387:
URL: https://github.com/apache/spark/pull/28387#issuecomment-620429212







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620429256







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28388: [SPARK-31553][SQL] Revert "[SPARK-29048] Improve performance on Column.isInCollection() with a large size collection"

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28388:
URL: https://github.com/apache/spark/pull/28388#issuecomment-620429270







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28388: [SPARK-31553][SQL] Revert "[SPARK-29048] Improve performance on Column.isInCollection() with a large size collection"

2020-04-28 Thread GitBox


[GitHub] [spark] MaxGekk commented on pull request #28388: [SPARK-31553][SQL] Revert "[SPARK-29048] Improve performance on Column.isInCollection() with a large size collection"

2020-04-28 Thread GitBox


MaxGekk commented on pull request #28388:
URL: https://github.com/apache/spark/pull/28388#issuecomment-620428658


   also cc @WeichenXu123 @dongjoon-hyun @maropu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28387: [SPARK-26924][R] remove requireNamespace1 workaround for arrow

2020-04-28 Thread GitBox


SparkQA commented on pull request #28387:
URL: https://github.com/apache/spark/pull/28387#issuecomment-620428791


   **[Test build #121970 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121970/testReport)**
 for PR 28387 at commit 
[`5c255bb`](https://github.com/apache/spark/commit/5c255bbf8ca03cda7b60ae6c99712e803e56078d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-28 Thread GitBox


SparkQA commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-620428775


   **[Test build #121973 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121973/testReport)**
 for PR 28194 at commit 
[`9803e4a`](https://github.com/apache/spark/commit/9803e4a2f1ebfd10c7054bd0a16f2521528d81c7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sathyaprakashg commented on pull request #28222: SPARK-31447 Fix issue in ExtractIntervalPart expression

2020-04-28 Thread GitBox


sathyaprakashg commented on pull request #28222:
URL: https://github.com/apache/spark/pull/28222#issuecomment-620428630


   @cloud-fan @MaxGekk @yaooqinn I am looking for help to review this PR 
created  2 weeks ago. Since you guys were involved in PR related to simillar 
change (https://issues.apache.org/jira/browse/SPARK-31469), I am tagging you 
guys to see if you can help it review it.
   
   Since it is my first PR, please bear with me if I missed anything. I am 
happy to get guidance to improve it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620428768


   **[Test build #121971 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121971/testReport)**
 for PR 28386 at commit 
[`e155c0d`](https://github.com/apache/spark/commit/e155c0d631f61b6d84d3f44040ae07d7ff55ec54).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28381: [SPARK-31586][SQL] Replace expression TimeSub(l, r) with TimeAdd(l -r)

2020-04-28 Thread GitBox


SparkQA commented on pull request #28381:
URL: https://github.com/apache/spark/pull/28381#issuecomment-620428796


   **[Test build #121972 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121972/testReport)**
 for PR 28381 at commit 
[`17b0438`](https://github.com/apache/spark/commit/17b0438fc95b4306662c122f255ab0e9e5337425).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    2   3   4   5   6   7   8   9   >