[GitHub] [spark] AmplabJenkins removed a comment on pull request #28596: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"

2020-05-21 Thread GitBox


AmplabJenkins removed a comment on pull request #28596:
URL: https://github.com/apache/spark/pull/28596#issuecomment-631955788







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions

2020-05-21 Thread GitBox


AmplabJenkins removed a comment on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631955875







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions

2020-05-21 Thread GitBox


AmplabJenkins commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631955875







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28596: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"

2020-05-21 Thread GitBox


AmplabJenkins commented on pull request #28596:
URL: https://github.com/apache/spark/pull/28596#issuecomment-631955788







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions

2020-05-21 Thread GitBox


SparkQA commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631955288


   **[Test build #122921 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122921/testReport)**
 for PR 28534 at commit 
[`c1257f2`](https://github.com/apache/spark/commit/c1257f2552320141e435884629350685d5439d3e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28596: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"

2020-05-21 Thread GitBox


SparkQA commented on pull request #28596:
URL: https://github.com/apache/spark/pull/28596#issuecomment-631955215


   **[Test build #122920 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122920/testReport)**
 for PR 28596 at commit 
[`2be2ca7`](https://github.com/apache/spark/commit/2be2ca728640ff68f4c9ff271d22ae67cf6db44f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #28584: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-21 Thread GitBox


Ngone51 commented on pull request #28584:
URL: https://github.com/apache/spark/pull/28584#issuecomment-631954552


   Opened a separate PR https://github.com/apache/spark/pull/28596 to fix the 
test @WeichenXu123 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions

2020-05-21 Thread GitBox


TJX2014 commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428512148



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -401,6 +401,78 @@ case class DayOfYear(child: Expression) extends 
UnaryExpression with ImplicitCas
   }
 }
 
+abstract class NumberToTimestampBase extends UnaryExpression
+  with ImplicitCastInputTypes {

Review comment:
   @cloud-fan  `value.asInstanceOf[Number].intValue() ` -> 
`value.asInstanceOf[Number].longValue()` ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #28596: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"

2020-05-21 Thread GitBox


Ngone51 commented on pull request #28596:
URL: https://github.com/apache/spark/pull/28596#issuecomment-631953628


   ping @sarthfrey @WeichenXu123 @jiangxb1987 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions

2020-05-21 Thread GitBox


TJX2014 commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428512148



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -401,6 +401,78 @@ case class DayOfYear(child: Expression) extends 
UnaryExpression with ImplicitCas
   }
 }
 
+abstract class NumberToTimestampBase extends UnaryExpression
+  with ImplicitCastInputTypes {

Review comment:
   `value.asInstanceOf[Number].intValue() ` -> 
`value.asInstanceOf[Number].longValue()` ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 opened a new pull request #28596: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"

2020-05-21 Thread GitBox


Ngone51 opened a new pull request #28596:
URL: https://github.com/apache/spark/pull/28596


   
   
   ### What changes were proposed in this pull request?
   
   
   Change from `messages.toList.iterator` to `Iterator.single(messages.toList)`.
   
   ### Why are the changes needed?
   
   
   In this test, the expected result of `rdd2.collect().head` should actually 
be `List("0", "1", "2", "3")` but is `"0"` now.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   
   ### How was this patch tested?
   
   
   Updated test.
   
   Thanks @WeichenXu123 reported this problem.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28583: [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier

2020-05-21 Thread GitBox


AmplabJenkins removed a comment on pull request #28583:
URL: https://github.com/apache/spark/pull/28583#issuecomment-631948059







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28583: [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier

2020-05-21 Thread GitBox


AmplabJenkins commented on pull request #28583:
URL: https://github.com/apache/spark/pull/28583#issuecomment-631948059







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28583: [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier

2020-05-21 Thread GitBox


SparkQA commented on pull request #28583:
URL: https://github.com/apache/spark/pull/28583#issuecomment-631947436


   **[Test build #122919 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122919/testReport)**
 for PR 28583 at commit 
[`5851842`](https://github.com/apache/spark/commit/5851842120f59d118bcbf60a2003f1097a28b203).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on pull request #28583: [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier

2020-05-21 Thread GitBox


sarutak commented on pull request #28583:
URL: https://github.com/apache/spark/pull/28583#issuecomment-631944356


   Test seems to be aborted.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on pull request #28583: [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier

2020-05-21 Thread GitBox


sarutak commented on pull request #28583:
URL: https://github.com/apache/spark/pull/28583#issuecomment-63194


   retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei commented on a change in pull request #28525: [SPARK-27562][Shuffle] Complete the verification mechanism for shuffle transmitted data

2020-05-21 Thread GitBox


turboFei commented on a change in pull request #28525:
URL: https://github.com/apache/spark/pull/28525#discussion_r428495266



##
File path: 
common/network-common/src/main/java/org/apache/spark/network/util/DigestUtils.java
##
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.util;
+
+import java.io.*;
+import java.nio.ByteBuffer;
+import java.nio.MappedByteBuffer;
+import java.nio.channels.FileChannel;
+import java.util.zip.CRC32;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class DigestUtils {
+private static final Logger LOG = 
LoggerFactory.getLogger(DigestUtils.class);
+private static final int STREAM_BUFFER_LENGTH = 8192;
+private static final int DIGEST_LENGTH = 8;
+
+public static int getDigestLength() {
+   return DIGEST_LENGTH;
+}
+
+public static long getDigest(InputStream data) throws IOException {
+CRC32 crc32 = new CRC32();
+byte[] buffer = new byte[STREAM_BUFFER_LENGTH];
+int len;
+while ((len = data.read(buffer)) >= 0) {
+crc32.update(buffer, 0, len);
+}
+return crc32.getValue();
+}
+
+public static long getDigest(File file, long offset, long length) {
+if (length <= 0) {
+return -1L;
+}
+try (RandomAccessFile rf = new RandomAccessFile(file, "r")) {
+MappedByteBuffer data = 
rf.getChannel().map(FileChannel.MapMode.READ_ONLY, offset,
+  length);
+CRC32 crc32 = new CRC32();
+byte[] buffer = new byte[STREAM_BUFFER_LENGTH];
+int len;
+while ((len = Math.min(STREAM_BUFFER_LENGTH, data.remaining())) > 
0) {
+data.get(buffer, 0, len);
+crc32.update(buffer, 0, len);
+}
+return crc32.getValue();
+} catch (IOException e) {
+LOG.error(String.format("Exception while computing digest for file 
segment: " +
+  "%s(offset:%d, length:%d)", file.getName(), offset, length ));
+return -1L;
+}
+}
+}

Review comment:
   thx





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions

2020-05-21 Thread GitBox


cloud-fan commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631935225


   LGTM except https://github.com/apache/spark/pull/28534/files#r427952417



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions

2020-05-21 Thread GitBox


AmplabJenkins removed a comment on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631931016







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions

2020-05-21 Thread GitBox


AmplabJenkins commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631931016







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei commented on a change in pull request #28525: [SPARK-27562][Shuffle] Complete the verification mechanism for shuffle transmitted data

2020-05-21 Thread GitBox


turboFei commented on a change in pull request #28525:
URL: https://github.com/apache/spark/pull/28525#discussion_r428490170



##
File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
##
@@ -626,16 +628,61 @@ final class ShuffleBlockFetcherIterator(
   buf.release()
   throwFetchFailedException(blockId, mapIndex, address, e)
   }
+
+  // If shuffle digest enabled is true, check the block with checkSum.
+  var failedOnDigestCheck = false
+  if (digestEnabled) {
+if (digest >= 0) {
+  val digestToCheck = try {
+DigestUtils.getDigest(in)
+  } catch {
+case e: IOException =>
+  logError("Error occurs when checking digest", e)
+  buf.release()
+  throwFetchFailedException(blockId, mapIndex, address, e)
+  }
+  failedOnDigestCheck = digest != digestToCheck
+  if (!failedOnDigestCheck) {

Review comment:
   thx





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions

2020-05-21 Thread GitBox


SparkQA commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631930528


   **[Test build #122918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122918/testReport)**
 for PR 28534 at commit 
[`e4db9a5`](https://github.com/apache/spark/commit/e4db9a5b0b8def467a61f310040db1221d7c206e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-21 Thread GitBox


HyukjinKwon commented on a change in pull request #28593:
URL: https://github.com/apache/spark/pull/28593#discussion_r428488834



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2586,6 +2586,22 @@ object SQLConf {
   .checkValue(_ > 0, "The timeout value must be positive")
   .createWithDefault(10L)
 
+  val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE =
+buildConf("spark.sql.legacy.numericConvertToTimestampEnable")
+  .doc("when true,use legacy numberic can convert to timestamp")
+  .version("3.0.0")
+  .booleanConf
+  .createWithDefault(false)
+
+  val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_IN_SECONDS =
+buildConf("spark.sql.legacy.numericConvertToTimestampInSeconds")
+  .internal()
+  .doc("The legacy only works when 
LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE is true." +
+"when true,the value will be  interpreted as seconds,which follow 
spark style," +
+"when false,value is interpreted as milliseconds,which follow hive 
style")

Review comment:
   This kind of compatibility isn't fully guaranteed in Spark, see also 
[Compatibility with Apache 
Hive](https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html).
 Simply following Hive doesn't justify this PR.
   
   There are already a bunch of mismatched behaviours and I don't like to 
target more compatibility, in particular, by fixing the basic functionalities 
such as cast and adding such switches to maintain. Why is it difficult to just 
add `ts / 1000`? The workaround seems very easy.
   
   If we target to get rid of `cast(ts as long)` away, adding separate 
functions is a-okay because it doesn't affect existing users, and also looks 
other DBMSs have their own ways by having such functions in general. Looks we 
will have workarounds once these functions from 
https://github.com/apache/spark/pull/28534 are merged, and seems you can 
leverage these functions alternatively as well.
   
   I would say a-no to fix a basic functionality to have another variant and 
non-standard behaviour, which could potentially trigger to have another set of 
non-standard behaviours in many other places in Spark.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


AmplabJenkins commented on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631924505







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


AmplabJenkins removed a comment on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631924505







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


SparkQA commented on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631923918


   **[Test build #122917 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122917/testReport)**
 for PR 28556 at commit 
[`b77a1ba`](https://github.com/apache/spark/commit/b77a1ba23f5c39fb541bb8f61ebbbd62ffbb975d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-21 Thread GitBox


AmplabJenkins removed a comment on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631922879


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122912/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


AmplabJenkins removed a comment on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631922741


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122916/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-21 Thread GitBox


AmplabJenkins removed a comment on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631922905


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122914/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


viirya commented on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631923383


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-21 Thread GitBox


SparkQA removed a comment on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631892298


   **[Test build #122914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122914/testReport)**
 for PR 28595 at commit 
[`2d6e87b`](https://github.com/apache/spark/commit/2d6e87b1892683b6035bfdf6ed8c475bada5bbc8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


AmplabJenkins removed a comment on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631922732


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


SparkQA removed a comment on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631915185


   **[Test build #122916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122916/testReport)**
 for PR 28556 at commit 
[`b77a1ba`](https://github.com/apache/spark/commit/b77a1ba23f5c39fb541bb8f61ebbbd62ffbb975d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-21 Thread GitBox


AmplabJenkins removed a comment on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631922744







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-21 Thread GitBox


AmplabJenkins removed a comment on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631922899


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


AmplabJenkins commented on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631922732







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-21 Thread GitBox


AmplabJenkins commented on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631922899







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-21 Thread GitBox


SparkQA removed a comment on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631892304







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-21 Thread GitBox


AmplabJenkins commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631922744







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-21 Thread GitBox


SparkQA commented on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631922678


   **[Test build #122914 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122914/testReport)**
 for PR 28595 at commit 
[`2d6e87b`](https://github.com/apache/spark/commit/2d6e87b1892683b6035bfdf6ed8c475bada5bbc8).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-21 Thread GitBox


AmplabJenkins commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631922809







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-21 Thread GitBox


SparkQA commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631922675







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


SparkQA commented on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631922680


   **[Test build #122916 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122916/testReport)**
 for PR 28556 at commit 
[`b77a1ba`](https://github.com/apache/spark/commit/b77a1ba23f5c39fb541bb8f61ebbbd62ffbb975d).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] GuoPhilipse commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-21 Thread GitBox


GuoPhilipse commented on a change in pull request #28593:
URL: https://github.com/apache/spark/pull/28593#discussion_r428481765



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2586,6 +2586,22 @@ object SQLConf {
   .checkValue(_ > 0, "The timeout value must be positive")
   .createWithDefault(10L)
 
+  val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE =
+buildConf("spark.sql.legacy.numericConvertToTimestampEnable")
+  .doc("when true,use legacy numberic can convert to timestamp")
+  .version("3.0.0")
+  .booleanConf
+  .createWithDefault(false)
+
+  val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_IN_SECONDS =
+buildConf("spark.sql.legacy.numericConvertToTimestampInSeconds")
+  .internal()
+  .doc("The legacy only works when 
LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE is true." +
+"when true,the value will be  interpreted as seconds,which follow 
spark style," +
+"when false,value is interpreted as milliseconds,which follow hive 
style")

Review comment:
   Hi HyukjinKwon
   thanks for reviewing,we discussed the pain point when we move to spark in 
#28568,  i mean we can adopt both the compatibility  flag and adding functions, 
for using the function, user need to modify tasks one by one with the casting 
compatibility  flag turning off,unfortunally ,we have  almost hundred thousand 
tasks migrating from hive to spark, so with a flag ,we will first fail the 
tasks if it had CAST_NUMERIC_TO_TIMESTAMP,if user really want to use, we will 
suggest the NEW adding three functions for them,maybe it's a good way to avoid 
the case when the task has been succeed,while the casting result is wrong,which 
is more serious,maybe other brothers meet the same headache problem,so i hope 
we will embracing spark better with this patch,





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] TJX2014 commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-21 Thread GitBox


TJX2014 commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428478287



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MillisToTimestamp]("timestamp_milliseconds"),

Review comment:
   Ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


AmplabJenkins removed a comment on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631915587







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


AmplabJenkins commented on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631915587







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-05-21 Thread GitBox


cloud-fan commented on a change in pull request #28123:
URL: https://github.com/apache/spark/pull/28123#discussion_r428474929



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInEquiJoin.scala
##
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, 
Project, UnaryNode}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Wraps `LogicalRelation` to provide the number of buckets for coalescing.
+ */
+case class CoalesceBuckets(
+numCoalescedBuckets: Int,
+child: LogicalRelation) extends UnaryNode {
+  require(numCoalescedBuckets > 0,
+s"Number of coalesced buckets ($numCoalescedBuckets) must be positive.")
+
+  override def output: Seq[Attribute] = child.output
+}
+
+/**
+ * This rule adds a `CoalesceBuckets` logical plan if the following conditions 
are met:
+ *   - Two bucketed tables are joined.
+ *   - Join is the equi-join.
+ *   - The larger bucket number is divisible by the smaller bucket number.
+ *   - "spark.sql.bucketing.coalesceBucketsInJoin.enabled" is set to true.
+ *   - The difference in the number of buckets is less than the value set in
+ * "spark.sql.bucketing.coalesceBucketsInJoin.maxNumBucketsDiff".
+ */
+object CoalesceBucketsInEquiJoin extends Rule[LogicalPlan]  {
+  private def isPlanEligible(plan: LogicalPlan): Boolean = {
+def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = {
+  p(plan) && plan.children.forall(forall(_)(p))
+}
+
+forall(plan) {
+  case _: Filter | _: Project | _: LogicalRelation => true
+  case _ => false
+}
+  }
+
+  private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = {
+if (isPlanEligible(plan)) {
+  plan.collectFirst {
+case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) if 
r.bucketSpec.nonEmpty =>
+  r.bucketSpec.get
+  }
+} else {
+  None
+}
+  }
+
+  private def mayCoalesce(numBuckets1: Int, numBuckets2: Int, conf: SQLConf): 
Option[Int] = {
+assert(numBuckets1 != numBuckets2)
+val (small, large) = (math.min(numBuckets1, numBuckets2), 
math.max(numBuckets1, numBuckets2))
+// A bucket can be coalesced only if the bigger number of buckets is 
divisible by the smaller
+// number of buckets because bucket id is calculated by modding the total 
number of buckets.
+if ((large % small == 0) && ((large - small) <= 
conf.coalesceBucketsInJoinMaxNumBucketsDiff)) {
+  Some(small)
+} else {
+  None
+}
+  }
+
+  private def addCoalesceBuckets(plan: LogicalPlan, numCoalescedBuckets: Int): 
LogicalPlan = {
+plan.transformUp {
+  case l @ LogicalRelation(_: HadoopFsRelation, _, _, _) =>
+CoalesceBuckets(numCoalescedBuckets, l)
+}
+  }
+
+  object ExtractJoinWithBuckets {
+def unapply(plan: LogicalPlan): Option[(Join, Int, Int)] = {
+  plan match {
+case join @ ExtractEquiJoinKeys(_, _, _, _, left, right, _) =>

Review comment:
   yes





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-21 Thread GitBox


HyukjinKwon commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428474898



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MillisToTimestamp]("timestamp_milliseconds"),

Review comment:
   Let's also don't forget to update the PR title and description.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


SparkQA commented on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631915185


   **[Test build #122916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122916/testReport)**
 for PR 28556 at commit 
[`b77a1ba`](https://github.com/apache/spark/commit/b77a1ba23f5c39fb541bb8f61ebbbd62ffbb975d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


viirya commented on a change in pull request #28556:
URL: https://github.com/apache/spark/pull/28556#discussion_r428473708



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -34,7 +34,8 @@ object NestedColumnAliasing {
 : Option[(Map[ExtractValue, Alias], Map[ExprId, Seq[Alias]])] = plan match 
{
 case Project(projectList, child)
 if SQLConf.get.nestedSchemaPruningEnabled && 
canProjectPushThrough(child) =>
-  getAliasSubMap(projectList)
+  val exprsToPrune = projectList ++ child.expressions

Review comment:
   changed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-21 Thread GitBox


viirya commented on a change in pull request #28556:
URL: https://github.com/apache/spark/pull/28556#discussion_r428473655



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -204,15 +211,8 @@ object GeneratorNestedColumnAliasing {
   g: Generate,
   nestedFieldToAlias: Map[ExtractValue, Alias],
   attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = {
-val newGenerator = g.generator.transform {
-  case f: ExtractValue if nestedFieldToAlias.contains(f) =>
-nestedFieldToAlias(f).toAttribute
-}.asInstanceOf[Generator]
-
 // Defer updating `Generate.unrequiredChildIndex` to next round of 
`ColumnPruning`.
-val newGenerate = g.copy(generator = newGenerator)
-
-NestedColumnAliasing.replaceChildrenWithAliases(newGenerate, attrToAliases)
+NestedColumnAliasing.replaceChildrenWithAliases(g, nestedFieldToAlias, 
attrToAliases)

Review comment:
   Changed the method name.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] GuoPhilipse commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-21 Thread GitBox


GuoPhilipse commented on a change in pull request #28593:
URL: https://github.com/apache/spark/pull/28593#discussion_r428473216



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##
@@ -1277,7 +1285,11 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
 val block = inline"new java.math.BigDecimal($MICROS_PER_SECOND)"
 code"($d.toBigDecimal().bigDecimal().multiply($block)).longValue()"
   }
-  private[this] def longToTimeStampCode(l: ExprValue): Block = code"$l * 
(long)$MICROS_PER_SECOND"
+  private[this] def longToTimeStampCode(l: ExprValue): Block = {
+if (SQLConf.get.numericConvertToTimestampInSeconds) code"" +

Review comment:
   yes,let me correct it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-21 Thread GitBox


cloud-fan commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428472274



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MillisToTimestamp]("timestamp_milliseconds"),

Review comment:
   `timestamp_millisecond` -> `timestamp_millis`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-21 Thread GitBox


cloud-fan commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428472315



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MillisToTimestamp]("timestamp_milliseconds"),
+expression[MicrosToTimestamp]("timestamp_microseconds"),

Review comment:
   ditto





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-21 Thread GitBox


AmplabJenkins removed a comment on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631909869







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-21 Thread GitBox


AmplabJenkins commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631909869







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-21 Thread GitBox


SparkQA commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631909451


   **[Test build #122915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122915/testReport)**
 for PR 28534 at commit 
[`97263e8`](https://github.com/apache/spark/commit/97263e84877bb767e476cfde635e5fb991dc4d6e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6