[GitHub] AmplabJenkins removed a comment on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23392: [SPARK-26450][SQL] Avoid 
rebuilding map of schema for every column in projection
URL: https://github.com/apache/spark/pull/23392#issuecomment-450802529
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23392: [SPARK-26450][SQL] Avoid 
rebuilding map of schema for every column in projection
URL: https://github.com/apache/spark/pull/23392#issuecomment-450802533
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100632/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding 
map of schema for every column in projection
URL: https://github.com/apache/spark/pull/23392#issuecomment-450802533
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100632/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding 
map of schema for every column in projection
URL: https://github.com/apache/spark/pull/23392#issuecomment-450802529
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA removed a comment on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection

2019-01-01 Thread GitBox
SparkQA removed a comment on issue #23392: [SPARK-26450][SQL] Avoid rebuilding 
map of schema for every column in projection
URL: https://github.com/apache/spark/pull/23392#issuecomment-450784176
 
 
   **[Test build #100632 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100632/testReport)**
 for PR 23392 at commit 
[`a25b59c`](https://github.com/apache/spark/commit/a25b59ca756958370dd7ba14d6c1e33dec424ea8).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection

2019-01-01 Thread GitBox
SparkQA commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of 
schema for every column in projection
URL: https://github.com/apache/spark/pull/23392#issuecomment-450802341
 
 
   **[Test build #100632 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100632/testReport)**
 for PR 23392 at commit 
[`a25b59c`](https://github.com/apache/spark/commit/a25b59ca756958370dd7ba14d6c1e33dec424ea8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support 
pivoting using array column for `pivot(column)` API
URL: https://github.com/apache/spark/pull/23349#issuecomment-450800928
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support 
pivoting using array column for `pivot(column)` API
URL: https://github.com/apache/spark/pull/23349#issuecomment-450800930
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100631/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting 
using array column for `pivot(column)` API
URL: https://github.com/apache/spark/pull/23349#issuecomment-450800928
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting 
using array column for `pivot(column)` API
URL: https://github.com/apache/spark/pull/23349#issuecomment-450800930
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100631/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API

2019-01-01 Thread GitBox
SparkQA removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting 
using array column for `pivot(column)` API
URL: https://github.com/apache/spark/pull/23349#issuecomment-450782084
 
 
   **[Test build #100631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100631/testReport)**
 for PR 23349 at commit 
[`d91ade6`](https://github.com/apache/spark/commit/d91ade60e14dbb7327351de5c59f50ba7d66e26a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API

2019-01-01 Thread GitBox
SparkQA commented on issue #23349: [SPARK-26403][SQL] Support pivoting using 
array column for `pivot(column)` API
URL: https://github.com/apache/spark/pull/23349#issuecomment-450800738
 
 
   **[Test build #100631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100631/testReport)**
 for PR 23349 at commit 
[`d91ade6`](https://github.com/apache/spark/commit/d91ade60e14dbb7327351de5c59f50ba7d66e26a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] gatorsmile commented on a change in pull request #21826: [SPARK-24872] Replace the symbol '||' of Or operator with 'or'

2019-01-01 Thread GitBox
gatorsmile commented on a change in pull request #21826: [SPARK-24872] Replace 
the symbol '||' of Or operator with 'or'
URL: https://github.com/apache/spark/pull/21826#discussion_r244672255
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ##
 @@ -442,7 +442,7 @@ case class Or(left: Expression, right: Expression) extends 
BinaryOperator with P
 
   override def inputType: AbstractDataType = BooleanType
 
-  override def symbol: String = "||"
+  override def symbol: String = "or"
 
 Review comment:
   So far, yes. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] gatorsmile commented on a change in pull request #23391: [SPARK-26456][SQL] Cast date/timestamp to string by Date/TimestampFormatter

2019-01-01 Thread GitBox
gatorsmile commented on a change in pull request #23391: [SPARK-26456][SQL] 
Cast date/timestamp to string by Date/TimestampFormatter
URL: https://github.com/apache/spark/pull/23391#discussion_r244671504
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ##
 @@ -230,7 +235,7 @@ object PartitioningUtils {
 // Once we get the string, we try to parse it and find the partition 
column and value.
 val maybeColumn =
   parsePartitionColumn(currentPath.getName, typeInference, 
userSpecifiedDataTypes,
-validatePartitionColumns, timeZone)
+validatePartitionColumns, timeZone, dateFormatter, 
timestampFormatter)
 
 Review comment:
   When the partition/bucket column is Date type, our parquet writer convert 
the date value to a string and record it as the directory name. Is that 
possible that our Spark could return a wrong result? 
   
   For example, 
   - Join two partitioned tables using the partitioned key column whose data 
type is Date. 
   - One table is wrote by Spark 2.4 or prior, and another table is wrote by 
Spark 3.0 or later.
   - The partition columns contain the value before October 1582.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] gatorsmile commented on a change in pull request #23391: [SPARK-26456][SQL] Cast date/timestamp to string by Date/TimestampFormatter

2019-01-01 Thread GitBox
gatorsmile commented on a change in pull request #23391: [SPARK-26456][SQL] 
Cast date/timestamp to string by Date/TimestampFormatter
URL: https://github.com/apache/spark/pull/23391#discussion_r244671504
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ##
 @@ -230,7 +235,7 @@ object PartitioningUtils {
 // Once we get the string, we try to parse it and find the partition 
column and value.
 val maybeColumn =
   parsePartitionColumn(currentPath.getName, typeInference, 
userSpecifiedDataTypes,
-validatePartitionColumns, timeZone)
+validatePartitionColumns, timeZone, dateFormatter, 
timestampFormatter)
 
 Review comment:
   When the partition column is Date type, our parquet writer stores the date 
value as the directory name. Is that possible that our Spark could return a 
wrong result? 
   
   For example, 
   - Join two partitioned tables using the partitioned key column whose data 
type is Date. 
   - One table is wrote by Spark 2.4 or prior, and another table is wrote by 
Spark 3.0 or later.
   - The partition columns contain the value before October 1582.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] felixcheung commented on a change in pull request #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set

2019-01-01 Thread GitBox
felixcheung commented on a change in pull request #23424: 
[SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if 
Cleaner can't be set
URL: https://github.com/apache/spark/pull/23424#discussion_r244670802
 
 

 ##
 File path: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java
 ##
 @@ -209,21 +209,25 @@ public static long reallocateMemory(long address, long 
oldSize, long newSize) {
   }
 
   /**
-   * Uses internal JDK APIs to allocate a DirectByteBuffer while ignoring the 
JVM's
-   * MaxDirectMemorySize limit (the default limit is too low and we do not 
want to require users
-   * to increase it).
+   * Allocate a DirectByteBuffer, potentially bypassing the JVM's 
MaxDirectMemorySize limit.
*/
   public static ByteBuffer allocateDirectBuffer(int size) {
 try {
+  if (CLEANER_CREATE_METHOD == null) {
+// Can't set a Cleaner (see comments on field), so need to allocate 
via normal Java APIs
+return ByteBuffer.allocateDirect(size);
 
 Review comment:
   try /catch OOM and log a message on setting MaxDirectMemorySize?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] wangyum commented on a change in pull request #22999: [SPARK-20319][SQL] Already quoted identifiers are getting wrapped with additional quotes

2019-01-01 Thread GitBox
wangyum commented on a change in pull request #22999: [SPARK-20319][SQL] 
Already quoted identifiers are getting wrapped with additional quotes
URL: https://github.com/apache/spark/pull/22999#discussion_r244670287
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala
 ##
 @@ -81,6 +81,10 @@ private case object OracleDialect extends JdbcDialect {
 case _ => None
   }
 
+  override def quoteIdentifier(colName: String): String = {
+s${colName.stripPrefix("\"").stripSuffix("\"")}
 
 Review comment:
   If so, we need verify both `getInsertStatement` and `createTable`?
   
https://github.com/apache/spark/blob/5f0ddd2d6e2fdebf549207bbc4b13ca709eee3c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L729-L746


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the 
difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#issuecomment-450796525
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the 
difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#issuecomment-450796525
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the 
difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#issuecomment-450796528
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100630/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the 
difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#issuecomment-450796528
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100630/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
SparkQA removed a comment on issue #23388: [SPARK-26448][SQL] retain the 
difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#issuecomment-450778855
 
 
   **[Test build #100630 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100630/testReport)**
 for PR 23388 at commit 
[`c228ad9`](https://github.com/apache/spark/commit/c228ad97fcbed7e93940d120f177817f7ad55c27).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
SparkQA commented on issue #23388: [SPARK-26448][SQL] retain the difference 
between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#issuecomment-450796348
 
 
   **[Test build #100630 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100630/testReport)**
 for PR 23388 at commit 
[`c228ad9`](https://github.com/apache/spark/commit/c228ad97fcbed7e93940d120f177817f7ad55c27).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23419: [SPARK-26507][CORE] Fix core 
tests for Java 11
URL: https://github.com/apache/spark/pull/23419#issuecomment-450795529
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23419: [SPARK-26507][CORE] Fix core tests for 
Java 11
URL: https://github.com/apache/spark/pull/23419#issuecomment-450795530
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100629/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23419: [SPARK-26507][CORE] Fix core 
tests for Java 11
URL: https://github.com/apache/spark/pull/23419#issuecomment-450795530
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100629/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23419: [SPARK-26507][CORE] Fix core tests for 
Java 11
URL: https://github.com/apache/spark/pull/23419#issuecomment-450795529
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA removed a comment on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11

2019-01-01 Thread GitBox
SparkQA removed a comment on issue #23419: [SPARK-26507][CORE] Fix core tests 
for Java 11
URL: https://github.com/apache/spark/pull/23419#issuecomment-450777104
 
 
   **[Test build #100629 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100629/testReport)**
 for PR 23419 at commit 
[`e4551bd`](https://github.com/apache/spark/commit/e4551bd63bba3578824f235f37cf8aded490805f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11

2019-01-01 Thread GitBox
SparkQA commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 
11
URL: https://github.com/apache/spark/pull/23419#issuecomment-450795355
 
 
   **[Test build #100629 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100629/testReport)**
 for PR 23419 at commit 
[`e4551bd`](https://github.com/apache/spark/commit/e4551bd63bba3578824f235f37cf8aded490805f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23417: [SPARK-26374][TEST][SQL] Enable TimestampFormatter in HadoopFsRelationTest

2019-01-01 Thread GitBox
HyukjinKwon commented on a change in pull request #23417: 
[SPARK-26374][TEST][SQL] Enable TimestampFormatter in HadoopFsRelationTest
URL: https://github.com/apache/spark/pull/23417#discussion_r244668273
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala
 ##
 @@ -138,7 +137,8 @@ abstract class HadoopFsRelationTest extends QueryTest with 
SQLTestUtils with Tes
   logInfo(s"Testing $dataType data type$extraMessage")
 
   val extraOptions = Map[String, String](
-"parquet.enable.dictionary" -> 
parquetDictionaryEncodingEnabled.toString
+"parquet.enable.dictionary" -> 
parquetDictionaryEncodingEnabled.toString,
+"timestampFormat" -> "-MM-dd'T'HH:mm:ss.SSSX"
 
 Review comment:
   Similar question was raised at 
https://github.com/apache/spark/pull/23417#discussion_r244549254. Looks this is 
going to be investigated soon separately.
   
   It's going to at least introduce some behaviour changes:
   
   ```scala
   scala> val fomatter = new 
org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter("-MM-dd'T'HH:mm:ss.SSSXXX",
 java.util.TimeZone.getDefault(), java.util.Locale.US)
   fomatter: org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter = 
org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter@2df019b8
   
   scala> fomatter.format(fomatter.parse("0015-03-10T08:53:43.591+07:30"))
   res0: String = 0015-03-10T08:19:08.591+06:55
   
   scala> val fomatter = new 
org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter("-MM-dd'T'HH:mm:ss.SSSX",
 java.util.TimeZone.getDefault(), java.util.Locale.US)
   fomatter: org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter = 
org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter@763ff91
   
   scala> fomatter.format(fomatter.parse("0015-03-10T08:53:43.591+07:30"))
   res1: String = 0015-03-10T08:19:08.591+06:55:25
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on a change in pull request #23417: [SPARK-26374][TEST][SQL] Enable TimestampFormatter in HadoopFsRelationTest

2019-01-01 Thread GitBox
cloud-fan commented on a change in pull request #23417: 
[SPARK-26374][TEST][SQL] Enable TimestampFormatter in HadoopFsRelationTest
URL: https://github.com/apache/spark/pull/23417#discussion_r244665820
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala
 ##
 @@ -138,7 +137,8 @@ abstract class HadoopFsRelationTest extends QueryTest with 
SQLTestUtils with Tes
   logInfo(s"Testing $dataType data type$extraMessage")
 
   val extraOptions = Map[String, String](
-"parquet.enable.dictionary" -> 
parquetDictionaryEncodingEnabled.toString
+"parquet.enable.dictionary" -> 
parquetDictionaryEncodingEnabled.toString,
+"timestampFormat" -> "-MM-dd'T'HH:mm:ss.SSSX"
 
 Review comment:
   with the new parser and the default timestamp format, spark can't write and 
read back timestamp data before 1582?
   
   what's the consequence if we make this the default format?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on issue #23422: [SPARK-26514][CORE] Support running multi tasks per cpu core

2019-01-01 Thread GitBox
srowen commented on issue #23422: [SPARK-26514][CORE] Support running multi 
tasks per cpu core
URL: https://github.com/apache/spark/pull/23422#issuecomment-450786857
 
 
   I don't think we can do this. First, the new config name is pretty 
confusing; I understand you're reversing the order of cpu and tasks but it 
really is just going to confuse people. This doesn't resolve what happens if 
both are set. If anything, it's more reasonable to let spark.task.cpus take on 
fractional values.
   
   Or just let the resource manager over-commit cores for your machines. Let it 
say there are 96 cores on a 64 core machine, and let Spark use them as usual. 
This was possible on YARN, but I am actually not sure about other resource 
managers.
   
   What's the use case? this and the JIRA don't give any argument for it. An 
I/O-bound job that can nevertheless do more I/O if it's parallelized further? 
You can just increase the parallelism already without this change; it'll cause 
you to use more executor slots than otherwise, but, those won't matter unless 
the use case is also that there are other concurrent Spark jobs that could use 
the slots.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions.

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #22141: [SPARK-25154][SQL] Support NOT 
IN sub-queries inside nested OR conditions.
URL: https://github.com/apache/spark/pull/22141#issuecomment-450786776
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions.

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #22141: [SPARK-25154][SQL] Support NOT 
IN sub-queries inside nested OR conditions.
URL: https://github.com/apache/spark/pull/22141#issuecomment-450786780
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6553/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions.

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #22141: [SPARK-25154][SQL] Support NOT IN 
sub-queries inside nested OR conditions.
URL: https://github.com/apache/spark/pull/22141#issuecomment-450786780
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6553/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions.

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #22141: [SPARK-25154][SQL] Support NOT IN 
sub-queries inside nested OR conditions.
URL: https://github.com/apache/spark/pull/22141#issuecomment-450786776
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23425: [SPARK-26306][TEST][BUILD] 
More memory to de-flake SorterSuite
URL: https://github.com/apache/spark/pull/23425#issuecomment-450786442
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23425: [SPARK-26306][TEST][BUILD] 
More memory to de-flake SorterSuite
URL: https://github.com/apache/spark/pull/23425#issuecomment-450786443
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6552/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23425: [SPARK-26306][TEST][BUILD] More memory 
to de-flake SorterSuite
URL: https://github.com/apache/spark/pull/23425#issuecomment-450786442
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23425: [SPARK-26306][TEST][BUILD] More memory 
to de-flake SorterSuite
URL: https://github.com/apache/spark/pull/23425#issuecomment-450786443
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6552/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen closed pull request #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com

2019-01-01 Thread GitBox
srowen closed pull request #23420: [SPARK-26508][Core][SQL] Address warning 
messages in Java reported at lgtm.com
URL: https://github.com/apache/spark/pull/23420
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/util/ByteUnit.java
 
b/common/network-common/src/main/java/org/apache/spark/network/util/ByteUnit.java
index 984575acaf511..6f7925c26094d 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/util/ByteUnit.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/util/ByteUnit.java
@@ -18,11 +18,11 @@
 
 public enum ByteUnit {
   BYTE(1),
-  KiB(1024L),
-  MiB((long) Math.pow(1024L, 2L)),
-  GiB((long) Math.pow(1024L, 3L)),
-  TiB((long) Math.pow(1024L, 4L)),
-  PiB((long) Math.pow(1024L, 5L));
+  KiB(1L << 10),
+  MiB(1L << 20),
+  GiB(1L << 30),
+  TiB(1L << 40),
+  PiB(1L << 50);
 
   ByteUnit(long multiplier) {
 this.multiplier = multiplier;
@@ -50,7 +50,7 @@ public long convertTo(long d, ByteUnit u) {
 }
   }
 
-  public double toBytes(long d) {
+  public long toBytes(long d) {
 if (d < 0) {
   throw new IllegalArgumentException("Negative size value. Size must be 
positive: " + d);
 }
diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java
 
b/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java
index 43a6bc7dc3d06..201628b04fbef 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java
@@ -309,8 +309,8 @@ public int chunkFetchHandlerThreads() {
 }
 int chunkFetchHandlerThreadsPercent =
   conf.getInt("spark.shuffle.server.chunkFetchHandlerThreadsPercent", 100);
-return (int)Math.ceil(
- (this.serverThreads() > 0 ? this.serverThreads() : 2 * 
NettyRuntime.availableProcessors()) *
- chunkFetchHandlerThreadsPercent/(double)100);
+int threads =
+  this.serverThreads() > 0 ? this.serverThreads() : 2 * 
NettyRuntime.availableProcessors();
+return (int) Math.ceil(threads * (chunkFetchHandlerThreadsPercent / 
100.0));
   }
 }
diff --git 
a/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java 
b/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
index 7df8aafb2b674..2ff98a69ee1f4 100644
--- a/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
+++ b/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
@@ -712,7 +712,7 @@ public boolean append(Object kbase, long koff, int klen, 
Object vbase, long voff
   final long recordOffset = offset;
   UnsafeAlignedOffset.putSize(base, offset, klen + vlen + uaoSize);
   UnsafeAlignedOffset.putSize(base, offset + uaoSize, klen);
-  offset += (2 * uaoSize);
+  offset += (2L * uaoSize);
   Platform.copyMemory(kbase, koff, base, offset, klen);
   offset += klen;
   Platform.copyMemory(vbase, voff, base, offset, vlen);
@@ -780,7 +780,7 @@ private void allocate(int capacity) {
 assert (capacity >= 0);
 capacity = Math.max((int) Math.min(MAX_CAPACITY, 
ByteArrayMethods.nextPowerOf2(capacity)), 64);
 assert (capacity <= MAX_CAPACITY);
-longArray = allocateArray(capacity * 2);
+longArray = allocateArray(capacity * 2L);
 longArray.zeroOut();
 
 this.growthThreshold = (int) (capacity * loadFactor);
diff --git a/examples/src/main/java/org/apache/spark/examples/JavaTC.java 
b/examples/src/main/java/org/apache/spark/examples/JavaTC.java
index c9ca9c9b3a412..7e8df69e7e8da 100644
--- a/examples/src/main/java/org/apache/spark/examples/JavaTC.java
+++ b/examples/src/main/java/org/apache/spark/examples/JavaTC.java
@@ -71,7 +71,7 @@ public static void main(String[] args) {
 
 JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
 
-Integer slices = (args.length > 0) ? Integer.parseInt(args[0]): 2;
+int slices = (args.length > 0) ? Integer.parseInt(args[0]): 2;
 JavaPairRDD tc = jsc.parallelizePairs(generateGraph(), 
slices).cache();
 
 // Linear transitive closure: each round grows paths by one edge,
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java
index 27052be87b82e..b8d2c9f6a6584 100644
--- a/examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java
+++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java
@@ -111,7 +111,7 @@ public static void main(String[] args) {
   .setMetricName("rmse")
   .setLabelCol("rating")
   .setPredictionCol("prediction");
-Double rmse = 

[GitHub] srowen commented on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com

2019-01-01 Thread GitBox
srowen commented on issue #23420: [SPARK-26508][Core][SQL] Address warning 
messages in Java reported at lgtm.com
URL: https://github.com/apache/spark/pull/23420#issuecomment-450786128
 
 
   Merged to master


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite

2019-01-01 Thread GitBox
SparkQA commented on issue #23425: [SPARK-26306][TEST][BUILD] More memory to 
de-flake SorterSuite
URL: https://github.com/apache/spark/pull/23425#issuecomment-450786104
 
 
   **[Test build #100634 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100634/testReport)**
 for PR 23425 at commit 
[`1aa7ad7`](https://github.com/apache/spark/commit/1aa7ad7aee0e10fbefd78638c8b896e60e3715b5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen opened a new pull request #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite

2019-01-01 Thread GitBox
srowen opened a new pull request #23425: [SPARK-26306][TEST][BUILD] More memory 
to de-flake SorterSuite
URL: https://github.com/apache/spark/pull/23425
 
 
   ## What changes were proposed in this pull request?
   
   Increase test memory to avoid OOM in TimSort-related tests.
   
   ## How was this patch tested?
   
   Existing tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23424: [SPARK-24421][CORE][FOLLOWUP] 
Use normal direct ByteBuffer allocation if Cleaner can't be set
URL: https://github.com/apache/spark/pull/23424#issuecomment-450785846
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23424: [SPARK-24421][CORE][FOLLOWUP] 
Use normal direct ByteBuffer allocation if Cleaner can't be set
URL: https://github.com/apache/spark/pull/23424#issuecomment-450785847
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6551/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use 
normal direct ByteBuffer allocation if Cleaner can't be set
URL: https://github.com/apache/spark/pull/23424#issuecomment-450785847
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6551/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use 
normal direct ByteBuffer allocation if Cleaner can't be set
URL: https://github.com/apache/spark/pull/23424#issuecomment-450785846
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set

2019-01-01 Thread GitBox
SparkQA commented on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal 
direct ByteBuffer allocation if Cleaner can't be set
URL: https://github.com/apache/spark/pull/23424#issuecomment-450785770
 
 
   **[Test build #100633 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100633/testReport)**
 for PR 23424 at commit 
[`2f267a5`](https://github.com/apache/spark/commit/2f267a5d63e10d3d1e986a346a4385a93a27ce7c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite

2019-01-01 Thread GitBox
SparkQA commented on issue #23404: [SPARK-26501]Fix unexpected overriden of 
exitFn in SparkSubmitSuite
URL: https://github.com/apache/spark/pull/23404#issuecomment-450785748
 
 
   **[Test build #4492 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4492/testReport)**
 for PR 23404 at commit 
[`66a9d5d`](https://github.com/apache/spark/commit/66a9d5d333271eae76c18d4e33076724371bbe6a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23404: [SPARK-26501]Fix unexpected 
overriden of exitFn in SparkSubmitSuite
URL: https://github.com/apache/spark/pull/23404#issuecomment-450474110
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on a change in pull request #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set

2019-01-01 Thread GitBox
srowen commented on a change in pull request #23424: 
[SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if 
Cleaner can't be set
URL: https://github.com/apache/spark/pull/23424#discussion_r244661628
 
 

 ##
 File path: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java
 ##
 @@ -209,21 +209,25 @@ public static long reallocateMemory(long address, long 
oldSize, long newSize) {
   }
 
   /**
-   * Uses internal JDK APIs to allocate a DirectByteBuffer while ignoring the 
JVM's
-   * MaxDirectMemorySize limit (the default limit is too low and we do not 
want to require users
-   * to increase it).
+   * Allocate a DirectByteBuffer, potentially bypassing the JVM's 
MaxDirectMemorySize limit.
*/
   public static ByteBuffer allocateDirectBuffer(int size) {
 try {
+  if (CLEANER_CREATE_METHOD == null) {
+// Can't set a Cleaner (see comments on field), so need to allocate 
via normal Java APIs
+return ByteBuffer.allocateDirect(size);
+  }
+  // Otherwise, use internal JDK APIs to allocate a DirectByteBuffer while 
ignoring the JVM's
+  // MaxDirectMemorySize limit (the default limit is too low and we do not 
want to
+  // require users to increase it).
   long memory = allocateMemory(size);
   ByteBuffer buffer = (ByteBuffer) DBB_CONSTRUCTOR.newInstance(memory, 
size);
-  if (CLEANER_CREATE_METHOD != null) {
-try {
-  DBB_CLEANER_FIELD.set(buffer,
-  CLEANER_CREATE_METHOD.invoke(null, buffer, (Runnable) () -> 
freeMemory(memory)));
-} catch (IllegalAccessException | InvocationTargetException e) {
-  throw new IllegalStateException(e);
-}
+  try {
+DBB_CLEANER_FIELD.set(buffer,
+CLEANER_CREATE_METHOD.invoke(null, buffer, (Runnable) () -> 
freeMemory(memory)));
+  } catch (IllegalAccessException | InvocationTargetException e) {
+freeMemory(memory);
 
 Review comment:
   Just to be totally safe, free the memory that was allocated but can't be 
used now in this case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on a change in pull request #22993: [SPARK-24421][BUILD][CORE] Accessing sun.misc.Cleaner in JDK11

2019-01-01 Thread GitBox
srowen commented on a change in pull request #22993: [SPARK-24421][BUILD][CORE] 
Accessing sun.misc.Cleaner in JDK11
URL: https://github.com/apache/spark/pull/22993#discussion_r244661604
 
 

 ##
 File path: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java
 ##
 @@ -159,18 +213,18 @@ public static long reallocateMemory(long address, long 
oldSize, long newSize) {
* MaxDirectMemorySize limit (the default limit is too low and we do not 
want to require users
* to increase it).
*/
-  @SuppressWarnings("unchecked")
   public static ByteBuffer allocateDirectBuffer(int size) {
 try {
-  Class cls = Class.forName("java.nio.DirectByteBuffer");
-  Constructor constructor = cls.getDeclaredConstructor(Long.TYPE, 
Integer.TYPE);
-  constructor.setAccessible(true);
-  Field cleanerField = cls.getDeclaredField("cleaner");
-  cleanerField.setAccessible(true);
   long memory = allocateMemory(size);
-  ByteBuffer buffer = (ByteBuffer) constructor.newInstance(memory, size);
-  Cleaner cleaner = Cleaner.create(buffer, () -> freeMemory(memory));
-  cleanerField.set(buffer, cleaner);
+  ByteBuffer buffer = (ByteBuffer) DBB_CONSTRUCTOR.newInstance(memory, 
size);
+  if (CLEANER_CREATE_METHOD != null) {
 
 Review comment:
   See https://github.com/apache/spark/pull/23424 ; I now think this was an 
error.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen opened a new pull request #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set

2019-01-01 Thread GitBox
srowen opened a new pull request #23424: [SPARK-24421][CORE][FOLLOWUP] Use 
normal direct ByteBuffer allocation if Cleaner can't be set
URL: https://github.com/apache/spark/pull/23424
 
 
   ## What changes were proposed in this pull request?
   
   In Java 9+ we can't use sun.misc.Cleaner by default anymore, and this was 
largely handled in https://github.com/apache/spark/pull/22993 However I think 
the change there left a significant problem.
   
   If a DirectByteBuffer is allocated using the reflective hack in Platform, 
now, we by default can't set a Cleaner. But I believe this means the memory 
isn't freed promptly or possibly at all. If a Cleaner can't be set, I think we 
need to use normal APIs to allocate the direct ByteBuffer.
   
   According to comments in the code, the downside is simply that the normal 
APIs will check and impose limits on how much off-heap memory can be allocated. 
Per the original review on https://github.com/apache/spark/pull/22993 this much 
seems fine, as either way in this case the user would have to add a JVM setting 
(increase max, or allow the reflective access).
   
   ## How was this patch tested?
   
   Existing tests. This resolved an OutOfMemoryError in Java 11 from TimSort 
tests without increasing test heap size. (See 
https://github.com/apache/spark/pull/23419#issuecomment-450772125 ) This 
suggests there is a problem and that this resolves it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] beliefer commented on a change in pull request #23409: [SPARK-26502][SQL] Move hiveResultString() from QueryExecution to HiveResult

2019-01-01 Thread GitBox
beliefer commented on a change in pull request #23409: [SPARK-26502][SQL] Move 
hiveResultString() from QueryExecution to HiveResult
URL: https://github.com/apache/spark/pull/23409#discussion_r244661148
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
 ##
 @@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import java.nio.charset.StandardCharsets
+import java.sql.{Date, Timestamp}
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.execution.command.{DescribeTableCommand, 
ExecutedCommandExec, ShowTablesCommand}
+import org.apache.spark.sql.types._
+
+object HiveResult {
 
 Review comment:
   HiveResult.hiveResultString seems could used for other(eg. MySql).So I 
suggest modify the name.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23420: [SPARK-26508][Core][SQL] 
Address warning messages in Java reported at lgtm.com
URL: https://github.com/apache/spark/pull/23420#issuecomment-450784759
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23420: [SPARK-26508][Core][SQL] Address 
warning messages in Java reported at lgtm.com
URL: https://github.com/apache/spark/pull/23420#issuecomment-450784759
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23420: [SPARK-26508][Core][SQL] Address 
warning messages in Java reported at lgtm.com
URL: https://github.com/apache/spark/pull/23420#issuecomment-450784760
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100628/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23420: [SPARK-26508][Core][SQL] 
Address warning messages in Java reported at lgtm.com
URL: https://github.com/apache/spark/pull/23420#issuecomment-450784760
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100628/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA removed a comment on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com

2019-01-01 Thread GitBox
SparkQA removed a comment on issue #23420: [SPARK-26508][Core][SQL] Address 
warning messages in Java reported at lgtm.com
URL: https://github.com/apache/spark/pull/23420#issuecomment-450765699
 
 
   **[Test build #100628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100628/testReport)**
 for PR 23420 at commit 
[`3df0a0a`](https://github.com/apache/spark/commit/3df0a0ab27b2c841da4c7b3da6ecf8b7f48d7e6d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com

2019-01-01 Thread GitBox
SparkQA commented on issue #23420: [SPARK-26508][Core][SQL] Address warning 
messages in Java reported at lgtm.com
URL: https://github.com/apache/spark/pull/23420#issuecomment-450784630
 
 
   **[Test build #100628 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100628/testReport)**
 for PR 23420 at commit 
[`3df0a0a`](https://github.com/apache/spark/commit/3df0a0ab27b2c841da4c7b3da6ecf8b7f48d7e6d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] LantaoJin commented on issue #22874: [SPARK-25865][CORE] Add GC information to ExecutorMetrics

2019-01-01 Thread GitBox
LantaoJin commented on issue #22874: [SPARK-25865][CORE] Add GC information to 
ExecutorMetrics
URL: https://github.com/apache/spark/pull/22874#issuecomment-450784430
 
 
   > They make sense over the entire lifetime of the executor, but not when 
viewed within one stage -- you'd want to subtract out the value at the 
beginning of the stage.
   
   You are right. They only make sense over the entire lifetime. I don't want 
to separate this metrics to multi-stage. I will check the current 
implementation. One of purposes to add this is determining the frequency of 
major & minor GC. Only memory usage couldn't tell us the rationality of memory 
allocation. For example, two cases, their configured memory for executor are 
all 10GB and their usages are all near 10GB. So should we increase or decrease 
the configured memory for them? This metrics may be helpful. We can increase 
configured memory for the first one if it has very frequency major GC and 
decrease the second one if only some minor GC and none major GC. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection

2019-01-01 Thread GitBox
SparkQA commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of 
schema for every column in projection
URL: https://github.com/apache/spark/pull/23392#issuecomment-450784176
 
 
   **[Test build #100632 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100632/testReport)**
 for PR 23392 at commit 
[`a25b59c`](https://github.com/apache/spark/commit/a25b59ca756958370dd7ba14d6c1e33dec424ea8).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way

2019-01-01 Thread GitBox
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] 
Split a SQL in correct way
URL: https://github.com/apache/spark/pull/23276#discussion_r244657377
 
 

 ##
 File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 ##
 @@ -18,13 +18,13 @@
 package org.apache.spark.sql.hive.thriftserver
 
 import java.io._
-import java.util.{ArrayList => JArrayList, Locale}
+import java.util.{ArrayList, Locale}
 
 Review comment:
   keep this unchanged?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way

2019-01-01 Thread GitBox
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] 
Split a SQL in correct way
URL: https://github.com/apache/spark/pull/23276#discussion_r244659512
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ##
 @@ -87,4 +92,93 @@ object StringUtils {
 }
 funcNames.toSeq
   }
+
+  /**
+   * Split the text into one or more SQLs with bracketed comments reserved
+   *
+   * Highlighted Corner Cases: semicolon in double quotes, single quotes or 
inline comments.
+   * Expected Behavior: The blanks will be trimed and a blank line will be 
omitted.
+   *
+   * @param text One or more SQLs separated by semicolons
+   * @return the trimmed SQL array (Array is for Java introp)
+   */
+  def split(text: String): Array[String] = {
+val D_QUOTE: Char = '"'
+val S_QUOTE: Char = '\''
+val Q_QUOTE: Char = '`'
+val SEMICOLON: Char = ';'
+val ESCAPE: Char = '\\'
+val DOT = '.'
+val SINGLE_COMMENT = "--"
+val BRACKETED_COMMENT_START = "/*"
+val BRACKETED_COMMENT_END = "*/"
+val FORWARD_SLASH = '/'
+
+// quoteFlag acts as an enum of D_QUOTE, S_QUOTE, DOT
+// * D_QUOTE: the cursor stands on a doulbe quoted string
+// * S_QUOTE: the cursor stands on a single quoted string
+// * DASH: the cursor stands in the SINGLE_COMMENT
+// * FORWARD_SLASH: the cursor stands in the BRACKETED_COMMENT
+// * DOT: default value for other cases
+var quoteFlag: Char = DOT
+var cursor: Int = 0
+val ret: mutable.ArrayBuffer[String] = mutable.ArrayBuffer()
+var currentSQL: mutable.StringBuilder = mutable.StringBuilder.newBuilder
+
+while (cursor < text.length) {
+  val current: Char = text(cursor)
+
+  text.substring(cursor) match {
 
 Review comment:
   Based on the current impl, there are many cases we need to consider. It is 
easy to miss one of it. Could we simplify it? first handling the special cases, 
e.g., in quotes, in comments, and then entering the regular mode. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way

2019-01-01 Thread GitBox
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] 
Split a SQL in correct way
URL: https://github.com/apache/spark/pull/23276#discussion_r244658838
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ##
 @@ -87,4 +92,93 @@ object StringUtils {
 }
 funcNames.toSeq
   }
+
+  /**
+   * Split the text into one or more SQLs with bracketed comments reserved
+   *
+   * Highlighted Corner Cases: semicolon in double quotes, single quotes or 
inline comments.
+   * Expected Behavior: The blanks will be trimed and a blank line will be 
omitted.
+   *
+   * @param text One or more SQLs separated by semicolons
+   * @return the trimmed SQL array (Array is for Java introp)
+   */
+  def split(text: String): Array[String] = {
+val D_QUOTE: Char = '"'
+val S_QUOTE: Char = '\''
+val Q_QUOTE: Char = '`'
+val SEMICOLON: Char = ';'
+val ESCAPE: Char = '\\'
+val DOT = '.'
+val SINGLE_COMMENT = "--"
+val BRACKETED_COMMENT_START = "/*"
+val BRACKETED_COMMENT_END = "*/"
+val FORWARD_SLASH = '/'
+
+// quoteFlag acts as an enum of D_QUOTE, S_QUOTE, DOT
+// * D_QUOTE: the cursor stands on a doulbe quoted string
+// * S_QUOTE: the cursor stands on a single quoted string
+// * DASH: the cursor stands in the SINGLE_COMMENT
+// * FORWARD_SLASH: the cursor stands in the BRACKETED_COMMENT
+// * DOT: default value for other cases
+var quoteFlag: Char = DOT
+var cursor: Int = 0
+val ret: mutable.ArrayBuffer[String] = mutable.ArrayBuffer()
+var currentSQL: mutable.StringBuilder = mutable.StringBuilder.newBuilder
+
+while (cursor < text.length) {
+  val current: Char = text(cursor)
+
+  text.substring(cursor) match {
+// if it stands on the opening of a bracketed comment, consume 2 
characters
+case remaining if quoteFlag == DOT
+  && current == '/'
+  && remaining.startsWith(BRACKETED_COMMENT_START) =>
+  quoteFlag = current
+  currentSQL.append("/*")
 
 Review comment:
   Use `BRACKETED_COMMENT_START`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way

2019-01-01 Thread GitBox
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] 
Split a SQL in correct way
URL: https://github.com/apache/spark/pull/23276#discussion_r244656958
 
 

 ##
 File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 ##
 @@ -331,6 +337,64 @@ private[hive] class SparkSQLCLIDriver extends CliDriver 
with Logging {
 console.printInfo(s"Spark master: $master, Application Id: $appId")
   }
 
+  override def processLine(line: String, allowInterrupting: Boolean): Int = {
 
 Review comment:
   Could you write a comment above this line and explain the code is from 
org.apache.hadoop.hive.cli.CliDriver?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way

2019-01-01 Thread GitBox
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] 
Split a SQL in correct way
URL: https://github.com/apache/spark/pull/23276#discussion_r244659180
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ##
 @@ -87,4 +92,93 @@ object StringUtils {
 }
 funcNames.toSeq
   }
+
+  /**
+   * Split the text into one or more SQLs with bracketed comments reserved
+   *
+   * Highlighted Corner Cases: semicolon in double quotes, single quotes or 
inline comments.
+   * Expected Behavior: The blanks will be trimed and a blank line will be 
omitted.
+   *
+   * @param text One or more SQLs separated by semicolons
+   * @return the trimmed SQL array (Array is for Java introp)
+   */
+  def split(text: String): Array[String] = {
+val D_QUOTE: Char = '"'
+val S_QUOTE: Char = '\''
+val Q_QUOTE: Char = '`'
+val SEMICOLON: Char = ';'
+val ESCAPE: Char = '\\'
+val DOT = '.'
+val SINGLE_COMMENT = "--"
+val BRACKETED_COMMENT_START = "/*"
+val BRACKETED_COMMENT_END = "*/"
+val FORWARD_SLASH = '/'
+
+// quoteFlag acts as an enum of D_QUOTE, S_QUOTE, DOT
+// * D_QUOTE: the cursor stands on a doulbe quoted string
+// * S_QUOTE: the cursor stands on a single quoted string
+// * DASH: the cursor stands in the SINGLE_COMMENT
+// * FORWARD_SLASH: the cursor stands in the BRACKETED_COMMENT
+// * DOT: default value for other cases
+var quoteFlag: Char = DOT
 
 Review comment:
   can we use enum? The current way mixed the actual flag with the symbol. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way

2019-01-01 Thread GitBox
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] 
Split a SQL in correct way
URL: https://github.com/apache/spark/pull/23276#discussion_r244658802
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ##
 @@ -87,4 +92,93 @@ object StringUtils {
 }
 funcNames.toSeq
   }
+
+  /**
+   * Split the text into one or more SQLs with bracketed comments reserved
+   *
+   * Highlighted Corner Cases: semicolon in double quotes, single quotes or 
inline comments.
+   * Expected Behavior: The blanks will be trimed and a blank line will be 
omitted.
+   *
+   * @param text One or more SQLs separated by semicolons
+   * @return the trimmed SQL array (Array is for Java introp)
+   */
+  def split(text: String): Array[String] = {
+val D_QUOTE: Char = '"'
+val S_QUOTE: Char = '\''
+val Q_QUOTE: Char = '`'
+val SEMICOLON: Char = ';'
+val ESCAPE: Char = '\\'
+val DOT = '.'
+val SINGLE_COMMENT = "--"
+val BRACKETED_COMMENT_START = "/*"
+val BRACKETED_COMMENT_END = "*/"
+val FORWARD_SLASH = '/'
+
+// quoteFlag acts as an enum of D_QUOTE, S_QUOTE, DOT
+// * D_QUOTE: the cursor stands on a doulbe quoted string
+// * S_QUOTE: the cursor stands on a single quoted string
+// * DASH: the cursor stands in the SINGLE_COMMENT
+// * FORWARD_SLASH: the cursor stands in the BRACKETED_COMMENT
+// * DOT: default value for other cases
+var quoteFlag: Char = DOT
+var cursor: Int = 0
+val ret: mutable.ArrayBuffer[String] = mutable.ArrayBuffer()
+var currentSQL: mutable.StringBuilder = mutable.StringBuilder.newBuilder
+
+while (cursor < text.length) {
+  val current: Char = text(cursor)
+
+  text.substring(cursor) match {
+// if it stands on the opening of a bracketed comment, consume 2 
characters
+case remaining if quoteFlag == DOT
+  && current == '/'
 
 Review comment:
   Could you follow the indentation convention in our code base? 
https://github.com/databricks/scala-style-guide#indent
   
   You can find many examples in the code base


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way

2019-01-01 Thread GitBox
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] 
Split a SQL in correct way
URL: https://github.com/apache/spark/pull/23276#discussion_r244657396
 
 

 ##
 File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 ##
 @@ -384,7 +448,7 @@ private[hive] class SparkSQLCLIDriver extends CliDriver 
with Logging {
 return ret
   }
 
-  val res = new JArrayList[String]()
+  val res = new ArrayList[String]()
 
 Review comment:
   keep this unchanged?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way

2019-01-01 Thread GitBox
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] 
Split a SQL in correct way
URL: https://github.com/apache/spark/pull/23276#discussion_r244659631
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
 ##
 @@ -43,4 +43,70 @@ class StringUtilsSuite extends SparkFunSuite {
 assert(filterPattern(names, " a. ") === Seq("a1", "a2"))
 assert(filterPattern(names, " d* ") === Nil)
   }
+
+  test("split a SQL") {
 
 Review comment:
   The test case coverage is still not enough. For example, adding some test 
cases from 
https://github.com/apache/hive/commit/65a65826a0d351a3d918bdb98595bdd106d37adb#diff-6182a9d1d63c707dff0ecd4e6a025fd2
 ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting 
using array column for `pivot(column)` API
URL: https://github.com/apache/spark/pull/23349#issuecomment-450782512
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support 
pivoting using array column for `pivot(column)` API
URL: https://github.com/apache/spark/pull/23349#issuecomment-450782512
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support 
pivoting using array column for `pivot(column)` API
URL: https://github.com/apache/spark/pull/23349#issuecomment-450782513
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6550/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting 
using array column for `pivot(column)` API
URL: https://github.com/apache/spark/pull/23349#issuecomment-450782513
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6550/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API

2019-01-01 Thread GitBox
SparkQA commented on issue #23349: [SPARK-26403][SQL] Support pivoting using 
array column for `pivot(column)` API
URL: https://github.com/apache/spark/pull/23349#issuecomment-450782084
 
 
   **[Test build #100631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100631/testReport)**
 for PR 23349 at commit 
[`d91ade6`](https://github.com/apache/spark/commit/d91ade60e14dbb7327351de5c59f50ba7d66e26a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API

2019-01-01 Thread GitBox
HyukjinKwon commented on a change in pull request #23349: [SPARK-26403][SQL] 
Support pivoting using array column for `pivot(column)` API
URL: https://github.com/apache/spark/pull/23349#discussion_r244658731
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
 ##
 @@ -422,10 +422,14 @@ class RelationalGroupedDataset protected[sql](
   def pivot(pivotColumn: Column, values: Seq[Any]): RelationalGroupedDataset = 
{
 groupType match {
   case RelationalGroupedDataset.GroupByType =>
-val valueExprs = values.map(_ match {
+val valueExprs = values.map {
   case c: Column => c.expr
+  // ArrayType returns a `WrappedArray` but currently `Literal.apply`
+  // does not support this type although it supports a normal array.
+  // Here manually unwrap to make it an array. See also SPARK-26403.
+  case v: collection.mutable.WrappedArray[_] => Literal.apply(v.array)
 
 Review comment:
   Yup.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource

2019-01-01 Thread GitBox
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without 
tokens by JSON datasource
URL: https://github.com/apache/spark/pull/23325#issuecomment-450780863
 
 
   then how about `from_json` always return null for corrupted record if mode 
is `PERMISSIVE`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] ConcurrencyPractitioner closed pull request #21651: [SPARK-18258] Sink need access to offset representation

2019-01-01 Thread GitBox
ConcurrencyPractitioner closed pull request #21651: [SPARK-18258] Sink need 
access to offset representation
URL: https://github.com/apache/spark/pull/21651
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala
index 08914d82fffdd..8014b6e733bb8 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala
@@ -21,6 +21,7 @@ import java.{util => ju}
 
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.{DataFrame, SQLContext}
+import org.apache.spark.sql.execution.streaming.OffsetSeq
 import org.apache.spark.sql.execution.streaming.Sink
 
 private[kafka010] class KafkaSink(
@@ -31,12 +32,12 @@ private[kafka010] class KafkaSink(
 
   override def toString(): String = "KafkaSink"
 
-  override def addBatch(batchId: Long, data: DataFrame): Unit = {
+  override def addBatch(batchId: Long, data: DataFrame, start: OffsetSeq, end: 
OffsetSeq): Unit = {
 if (batchId <= latestBatchId) {
   logInfo(s"Skipping already committed batch $batchId")
 } else {
   KafkaWriter.write(sqlContext.sparkSession,
-data.queryExecution, executorKafkaParams, topic)
+data.queryExecution, executorKafkaParams, topic, start, end)
   latestBatchId = batchId
 }
   }
diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
index d225c1ea6b7f1..1a5857a591499 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
@@ -252,7 +252,7 @@ private[kafka010] class KafkaSourceProvider extends 
DataSourceRegister
 val topic = parameters.get(TOPIC_OPTION_KEY).map(_.trim)
 val specifiedKafkaParams = kafkaParamsForProducer(parameters)
 KafkaWriter.write(outerSQLContext.sparkSession, data.queryExecution,
-  new ju.HashMap[String, Object](specifiedKafkaParams.asJava), topic)
+  new ju.HashMap[String, Object](specifiedKafkaParams.asJava), topic, 
null, null)
 
 /* This method is suppose to return a relation that reads the data that 
was written.
  * We cannot support this for Kafka. Therefore, in order to make things 
consistent,
diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala
index d90630a8adc93..112a48a718335 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala
@@ -23,6 +23,7 @@ import org.apache.kafka.clients.producer.{Callback, 
KafkaProducer, ProducerRecor
 
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.{Attribute, Cast, Literal, 
UnsafeProjection}
+import org.apache.spark.sql.execution.streaming.OffsetSeq
 import org.apache.spark.sql.types.{BinaryType, StringType}
 
 /**
@@ -33,7 +34,9 @@ import org.apache.spark.sql.types.{BinaryType, StringType}
 private[kafka010] class KafkaWriteTask(
 producerConfiguration: ju.Map[String, Object],
 inputSchema: Seq[Attribute],
-topic: Option[String]) extends KafkaRowWriter(inputSchema, topic) {
+topic: Option[String],
+start: OffsetSeq,  // not done
+end: OffsetSeq) extends KafkaRowWriter(inputSchema, topic) {
   // used to synchronize with Kafka callbacks
   private var producer: KafkaProducer[Array[Byte], Array[Byte]] = _
 
diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala
index 15cd44812cb0c..dc433edc10faa 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala
@@ -23,6 +23,7 @@ import org.apache.spark.internal.Logging
 import org.apache.spark.sql.{AnalysisException, SparkSession}
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.execution.{QueryExecution, SQLExecution}
+import org.apache.spark.sql.execution.streaming.OffsetSeq
 import org.apache.spark.sql.types.{BinaryType, StringType}
 import org.apache.spark.util.Utils
 

[GitHub] liupc commented on a change in pull request #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite

2019-01-01 Thread GitBox
liupc commented on a change in pull request #23404: [SPARK-26501]Fix unexpected 
overriden of exitFn in SparkSubmitSuite
URL: https://github.com/apache/spark/pull/23404#discussion_r244657632
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
 ##
 @@ -72,26 +72,32 @@ trait TestPrematureExit {
 mainObject.printStream = printStream
 
 @volatile var exitedCleanly = false
-mainObject.exitFn = (_) => exitedCleanly = true
-
-@volatile var exception: Exception = null
-val thread = new Thread {
-  override def run() = try {
-mainObject.main(input)
-  } catch {
-// Capture the exception to check whether the exception contains 
searchString or not
-case e: Exception => exception = e
-  }
+def withFakeExit(body: => Unit): Unit = {
 
 Review comment:
   I agree that a try-finally block is necessary, because the func body throws 
exception.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] liupc commented on a change in pull request #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite

2019-01-01 Thread GitBox
liupc commented on a change in pull request #23404: [SPARK-26501]Fix unexpected 
overriden of exitFn in SparkSubmitSuite
URL: https://github.com/apache/spark/pull/23404#discussion_r244656931
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
 ##
 @@ -72,26 +72,32 @@ trait TestPrematureExit {
 mainObject.printStream = printStream
 
 @volatile var exitedCleanly = false
-mainObject.exitFn = (_) => exitedCleanly = true
-
-@volatile var exception: Exception = null
-val thread = new Thread {
-  override def run() = try {
-mainObject.main(input)
-  } catch {
-// Capture the exception to check whether the exception contains 
searchString or not
-case e: Exception => exception = e
-  }
+def withFakeExit(body: => Unit): Unit = {
 
 Review comment:
   It's ok to just modify/restore exitFn in a straight way, but use a helper 
method here will make the modification of exitFn more clear?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] liupc commented on a change in pull request #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite

2019-01-01 Thread GitBox
liupc commented on a change in pull request #23404: [SPARK-26501]Fix unexpected 
overriden of exitFn in SparkSubmitSuite
URL: https://github.com/apache/spark/pull/23404#discussion_r244656931
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
 ##
 @@ -72,26 +72,32 @@ trait TestPrematureExit {
 mainObject.printStream = printStream
 
 @volatile var exitedCleanly = false
-mainObject.exitFn = (_) => exitedCleanly = true
-
-@volatile var exception: Exception = null
-val thread = new Thread {
-  override def run() = try {
-mainObject.main(input)
-  } catch {
-// Capture the exception to check whether the exception contains 
searchString or not
-case e: Exception => exception = e
-  }
+def withFakeExit(body: => Unit): Unit = {
 
 Review comment:
   It's ok to just modify/restore exitFn in a straight way, but use a helper 
method here will make the modification of exitFn more clear?
   moreover, I was wondering if a try-finally block is necessary? there is 
already a try-catch block in the func body of the only occurrence.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on a change in pull request #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API

2019-01-01 Thread GitBox
cloud-fan commented on a change in pull request #23349: [SPARK-26403][SQL] 
Support pivoting using array column for `pivot(column)` API
URL: https://github.com/apache/spark/pull/23349#discussion_r244656526
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
 ##
 @@ -422,10 +422,14 @@ class RelationalGroupedDataset protected[sql](
   def pivot(pivotColumn: Column, values: Seq[Any]): RelationalGroupedDataset = 
{
 groupType match {
   case RelationalGroupedDataset.GroupByType =>
-val valueExprs = values.map(_ match {
+val valueExprs = values.map {
   case c: Column => c.expr
+  // ArrayType returns a `WrappedArray` but currently `Literal.apply`
+  // does not support this type although it supports a normal array.
+  // Here manually unwrap to make it an array. See also SPARK-26403.
+  case v: collection.mutable.WrappedArray[_] => Literal.apply(v.array)
 
 Review comment:
   ah I see. Then I think it's better to put it in `Literal.apply`, as it can 
help more cases.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] SparkQA commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
SparkQA commented on issue #23388: [SPARK-26448][SQL] retain the difference 
between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#issuecomment-450778855
 
 
   **[Test build #100630 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100630/testReport)**
 for PR 23388 at commit 
[`c228ad9`](https://github.com/apache/spark/commit/c228ad97fcbed7e93940d120f177817f7ad55c27).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the 
difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#issuecomment-450778815
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6549/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the 
difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#issuecomment-450778815
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6549/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the 
difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#issuecomment-450778812
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the 
difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#issuecomment-450778812
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] 
retain the difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#discussion_r244656211
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
 ##
 @@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, And, ArrayTransform, 
CreateArray, CreateMap, CreateNamedStruct, CreateNamedStructUnsafe, 
CreateStruct, EqualTo, ExpectsInputTypes, Expression, GetStructField, 
LambdaFunction, NamedLambdaVariable, UnaryExpression}
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
+import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Subquery, 
Window}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.types._
+
+/**
+ * We need to take care of special floating numbers (NaN and -0.0) in several 
places:
+ *   1. When compare values, different NaNs should be treated as same, `-0.0` 
and `0.0` should be
+ *  treated as same.
+ *   2. In GROUP BY, different NaNs should belong to the same group, -0.0 and 
0.0 should belong
+ *  to the same group.
+ *   3. In join keys, different NaNs should be treated as same, `-0.0` and 
`0.0` should be
+ *  treated as same.
+ *   4. In window partition keys, different NaNs should be treated as same, 
`-0.0` and `0.0`
+ *  should be treated as same.
+ *
+ * Case 1 is fine, as we handle NaN and -0.0 well during comparison. For 
complex types, we
+ * recursively compare the fields/elements, so it's also fine.
+ *
+ * Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary 
directly, and different
+ * NaNs have different binary representation, and the same thing happens for 
-0.0 and 0.0.
+ *
+ * This rule normalizes NaN and -0.0 in Window partition keys, Join keys and 
Aggregate grouping
+ * expressions.
+ *
+ * Note that, this rule should be an analyzer rule, as it must be applied to 
make the query result
+ * corrected. Currently it's executed as an optimizer rule, because the 
optimizer may create new
+ * joins(for subquery) and reorder joins(may change the join condition), and 
this rule needs to be
+ * executed at the end.
+ */
+object NormalizeFloatingNumbers extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan match {
+// A subquery will be rewritten into join later, and will go through this 
rule
 
 Review comment:
   This is same as `ExtractPythonUDFs`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on a change in pull request #23391: [SPARK-26456][SQL] Cast date/timestamp to string by Date/TimestampFormatter

2019-01-01 Thread GitBox
srowen commented on a change in pull request #23391: [SPARK-26456][SQL] Cast 
date/timestamp to string by Date/TimestampFormatter
URL: https://github.com/apache/spark/pull/23391#discussion_r244656099
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ##
 @@ -230,7 +235,7 @@ object PartitioningUtils {
 // Once we get the string, we try to parse it and find the partition 
column and value.
 val maybeColumn =
   parsePartitionColumn(currentPath.getName, typeInference, 
userSpecifiedDataTypes,
-validatePartitionColumns, timeZone)
+validatePartitionColumns, timeZone, dateFormatter, 
timestampFormatter)
 
 Review comment:
   @MaxGekk you probably have a better summary than I do, but is the problem 
fundamentally about writing formatted dates incorrectly? older dates would have 
slightly the wrong hour/minute, and was it because the timezone wasn't fully 
and correctly specified? and this would manifest if writing to JSON or CSV? 
yeah, a brief summary of the type of bug that was fixed by the new parser would 
be helpful in the release notes (just "Docs text" of the JIRA)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] 
retain the difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#discussion_r244655875
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ##
 @@ -102,8 +102,8 @@ object ExtractEquiJoinKeys extends Logging with 
PredicateHelper {
   type ReturnType =
 (JoinType, Seq[Expression], Seq[Expression], Option[Expression], 
LogicalPlan, LogicalPlan)
 
-  def unapply(plan: LogicalPlan): Option[ReturnType] = plan match {
-case join @ Join(left, right, joinType, condition) =>
+  def unapply(join: Join): Option[ReturnType] = join match {
+case Join(left, right, joinType, condition) =>
 
 Review comment:
   we can, but that will introduce a lot of code diff, because of the 
indentation...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] 
retain the difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#discussion_r244655849
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
 ##
 @@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, And, ArrayTransform, 
CreateArray, CreateMap, CreateNamedStruct, CreateNamedStructUnsafe, 
CreateStruct, EqualTo, ExpectsInputTypes, Expression, GetStructField, 
LambdaFunction, NamedLambdaVariable, UnaryExpression}
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
+import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Subquery, 
Window}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.types._
+
+/**
+ * We need to take care of special floating numbers (NaN and -0.0) in several 
places:
+ *   1. When compare values, different NaNs should be treated as same, `-0.0` 
and `0.0` should be
+ *  treated as same.
+ *   2. In GROUP BY, different NaNs should belong to the same group, -0.0 and 
0.0 should belong
+ *  to the same group.
+ *   3. In join keys, different NaNs should be treated as same, `-0.0` and 
`0.0` should be
+ *  treated as same.
+ *   4. In window partition keys, different NaNs should be treated as same, 
`-0.0` and `0.0`
+ *  should be treated as same.
+ *
+ * Case 1 is fine, as we handle NaN and -0.0 well during comparison. For 
complex types, we
+ * recursively compare the fields/elements, so it's also fine.
+ *
+ * Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary 
directly, and different
+ * NaNs have different binary representation, and the same thing happens for 
-0.0 and 0.0.
+ *
+ * This rule normalizes NaN and -0.0 in Window partition keys, Join keys and 
Aggregate grouping
+ * expressions.
+ *
+ * Note that, this rule should be an analyzer rule, as it must be applied to 
make the query result
+ * corrected. Currently it's executed as an optimizer rule, because the 
optimizer may create new
+ * joins(for subquery) and reorder joins(may change the join condition), and 
this rule needs to be
+ * executed at the end.
+ */
+object NormalizeFloatingNumbers extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan match {
+// A subquery will be rewritten into join later, and will go through this 
rule
 
 Review comment:
   `OptimizeSubqueries` will apply the entire optimizer and triggers this rule.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-01 Thread GitBox
cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] 
retain the difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#discussion_r244655827
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
 ##
 @@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, And, ArrayTransform, 
CreateArray, CreateMap, CreateNamedStruct, CreateNamedStructUnsafe, 
CreateStruct, EqualTo, ExpectsInputTypes, Expression, GetStructField, 
LambdaFunction, NamedLambdaVariable, UnaryExpression}
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
+import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Subquery, 
Window}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.types._
+
+/**
+ * We need to take care of special floating numbers (NaN and -0.0) in several 
places:
+ *   1. When compare values, different NaNs should be treated as same, `-0.0` 
and `0.0` should be
+ *  treated as same.
+ *   2. In GROUP BY, different NaNs should belong to the same group, -0.0 and 
0.0 should belong
+ *  to the same group.
+ *   3. In join keys, different NaNs should be treated as same, `-0.0` and 
`0.0` should be
+ *  treated as same.
+ *   4. In window partition keys, different NaNs should be treated as same, 
`-0.0` and `0.0`
+ *  should be treated as same.
+ *
+ * Case 1 is fine, as we handle NaN and -0.0 well during comparison. For 
complex types, we
+ * recursively compare the fields/elements, so it's also fine.
+ *
+ * Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary 
directly, and different
+ * NaNs have different binary representation, and the same thing happens for 
-0.0 and 0.0.
+ *
+ * This rule normalizes NaN and -0.0 in Window partition keys, Join keys and 
Aggregate grouping
+ * expressions.
+ */
+object NormalizeFloatingNumbers extends Rule[LogicalPlan] {
 
 Review comment:
   ah good catch!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on a change in pull request #23391: [SPARK-26456][SQL] Cast date/timestamp to string by Date/TimestampFormatter

2019-01-01 Thread GitBox
cloud-fan commented on a change in pull request #23391: [SPARK-26456][SQL] Cast 
date/timestamp to string by Date/TimestampFormatter
URL: https://github.com/apache/spark/pull/23391#discussion_r244655731
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ##
 @@ -230,7 +235,7 @@ object PartitioningUtils {
 // Once we get the string, we try to parse it and find the partition 
column and value.
 val maybeColumn =
   parsePartitionColumn(currentPath.getName, typeInference, 
userSpecifiedDataTypes,
-validatePartitionColumns, timeZone)
+validatePartitionColumns, timeZone, dateFormatter, 
timestampFormatter)
 
 Review comment:
   then shall we update the migration guide about the difference? I think the 
changes here is better as SQL standard uses Gregorian calendar.
   
   IIUC, the behavior difference only happens when reading files? If users 
write a timestamp literal and display it, we should be fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23414: [SPARK-26449][PYTHON] add a transform method to the Dataframe class

2019-01-01 Thread GitBox
HyukjinKwon commented on a change in pull request #23414: [SPARK-26449][PYTHON] 
add a transform method to the Dataframe class
URL: https://github.com/apache/spark/pull/23414#discussion_r244655655
 
 

 ##
 File path: python/pyspark/sql/dataframe.py
 ##
 @@ -2046,6 +2046,40 @@ def toDF(self, *cols):
 jdf = self._jdf.toDF(self._jseq(cols))
 return DataFrame(jdf, self.sql_ctx)
 
+@since(3.0)
+def transform(self, func):
+"""Returns a new class:`DataFrame` according to a user-defined custom 
transform method.
+This allows chaining transformations rather than using nested or 
temporary variables.
+
+:param func: a user-defined custom transform function
+This is equiavalent to a nested call:
+actual_df = with_something(with_greeting(source_df), "crazy"))
+
+credit to: 
https://medium.com/@mrpowers/chaining-custom-pyspark-transformations-4f38a8c7ae55
+
+A more concrete example::
+>>> sc = pyspark.SparkContext(master='local')
+>>> spark = pyspark.sql.SparkSession(sparkContext=sc)
+>>> from pyspark.sql.functions import lit
+>>> def with_greeting(df):
+... return df.withColumn("greeting", lit("hi"))
+>>> def with_something(df, something):
+... return df.withColumn("something", lit(something))
+>>> data = [("jose", 1), ("li", 2), ("liz", 3)]
+>>> source_df = spark.createDataFrame(data, ["name", "age"])
+>>> actual_df = source_df.transform(with_greeting).transform(lambda x: 
with_something(x, "crazy"))
 
 Review comment:
   I think we don't necessarily have to demonstrate the chaining of multiple 
`transform`. We can chain other APIs as well, for instance, 
`df.transform(...).select(...).transform(...)` in that sense.
   
   `show()` is already DataFrame API. I think `df.transform(...).show()` is 
simple and good enough.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] srowen commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11

2019-01-01 Thread GitBox
srowen commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11
URL: https://github.com/apache/spark/pull/23419#issuecomment-450777350
 
 
   OK, so you are suggesting increasing the heap size there just because it 
currently fails sometimes? that's fine too, I can also make that change 
separately.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23414: [SPARK-26449][PYTHON] add a transform method to the Dataframe class

2019-01-01 Thread GitBox
HyukjinKwon commented on a change in pull request #23414: [SPARK-26449][PYTHON] 
add a transform method to the Dataframe class
URL: https://github.com/apache/spark/pull/23414#discussion_r244655567
 
 

 ##
 File path: python/pyspark/sql/dataframe.py
 ##
 @@ -2046,6 +2046,36 @@ def toDF(self, *cols):
 jdf = self._jdf.toDF(self._jseq(cols))
 return DataFrame(jdf, self.sql_ctx)
 
+@since(3.0)
+def transform(self, func):
+"""Returns a new class:`DataFrame` according to a custom transform 
function.
+This allows chaining transformations rather than using nested or 
temporary variables.
+
+:param func: a custom transform function which returns a DataFrame
 
 Review comment:
   nit: `DataFrame` -> `` class:`DataFrame` ``


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] dongjoon-hyun edited a comment on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11

2019-01-01 Thread GitBox
dongjoon-hyun edited a comment on issue #23419: [SPARK-26507][CORE] Fix core 
tests for Java 11
URL: https://github.com/apache/spark/pull/23419#issuecomment-450777144
 
 
   That's great! Yep. +1 for handling them separately.
   BTW, I found that `SorterSuite` flakiness issue was filed as 
https://issues.apache.org/jira/browse/SPARK-26306 . And, I added two recent 
Jenkins failure urls, too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] dongjoon-hyun commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11

2019-01-01 Thread GitBox
dongjoon-hyun commented on issue #23419: [SPARK-26507][CORE] Fix core tests for 
Java 11
URL: https://github.com/apache/spark/pull/23419#issuecomment-450777144
 
 
   That's great! Yep. +1 for handling them separately.
   `SorterSuite` flakiness issue was filed as 
https://issues.apache.org/jira/browse/SPARK-26306 .
   I added two recent Jenkins failure urls, too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23414: [SPARK-26449][PYTHON] add a transform method to the Dataframe class

2019-01-01 Thread GitBox
HyukjinKwon commented on a change in pull request #23414: [SPARK-26449][PYTHON] 
add a transform method to the Dataframe class
URL: https://github.com/apache/spark/pull/23414#discussion_r244655505
 
 

 ##
 File path: python/pyspark/sql/dataframe.py
 ##
 @@ -2046,6 +2046,36 @@ def toDF(self, *cols):
 jdf = self._jdf.toDF(self._jseq(cols))
 return DataFrame(jdf, self.sql_ctx)
 
+@since(3.0)
+def transform(self, func):
+"""Returns a new class:`DataFrame` according to a custom transform 
function.
+This allows chaining transformations rather than using nested or 
temporary variables.
+
+:param func: a custom transform function which returns a DataFrame
+
+>>> from pyspark.sql.functions import lit
+>>> def with_greeting(df):
 
 Review comment:
   Can we make the example more concise and meaningful? I think we should focus 
only on a simple example about the API itself rather then using `lambda`. For 
instance,
   
   ```python
   >>> df = spark.range(10)
   >>> def cast_to_str(input_df):
   ... return input_df.select([col(c).cast("string") for c in 
input_df.columns])
   >>> df.transform(cast_to_str).show()
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >