[GitHub] [spark] AmplabJenkins removed a comment on pull request #28596: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"
AmplabJenkins removed a comment on pull request #28596: URL: https://github.com/apache/spark/pull/28596#issuecomment-631955788 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions
AmplabJenkins removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631955875 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions
AmplabJenkins commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631955875 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28596: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"
AmplabJenkins commented on pull request #28596: URL: https://github.com/apache/spark/pull/28596#issuecomment-631955788 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions
SparkQA commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631955288 **[Test build #122921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122921/testReport)** for PR 28534 at commit [`c1257f2`](https://github.com/apache/spark/commit/c1257f2552320141e435884629350685d5439d3e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28596: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"
SparkQA commented on pull request #28596: URL: https://github.com/apache/spark/pull/28596#issuecomment-631955215 **[Test build #122920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122920/testReport)** for PR 28596 at commit [`2be2ca7`](https://github.com/apache/spark/commit/2be2ca728640ff68f4c9ff271d22ae67cf6db44f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #28584: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite
Ngone51 commented on pull request #28584: URL: https://github.com/apache/spark/pull/28584#issuecomment-631954552 Opened a separate PR https://github.com/apache/spark/pull/28596 to fix the test @WeichenXu123 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions
TJX2014 commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428512148 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -401,6 +401,78 @@ case class DayOfYear(child: Expression) extends UnaryExpression with ImplicitCas } } +abstract class NumberToTimestampBase extends UnaryExpression + with ImplicitCastInputTypes { Review comment: @cloud-fan `value.asInstanceOf[Number].intValue() ` -> `value.asInstanceOf[Number].longValue()` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #28596: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"
Ngone51 commented on pull request #28596: URL: https://github.com/apache/spark/pull/28596#issuecomment-631953628 ping @sarthfrey @WeichenXu123 @jiangxb1987 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions
TJX2014 commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428512148 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -401,6 +401,78 @@ case class DayOfYear(child: Expression) extends UnaryExpression with ImplicitCas } } +abstract class NumberToTimestampBase extends UnaryExpression + with ImplicitCastInputTypes { Review comment: `value.asInstanceOf[Number].intValue() ` -> `value.asInstanceOf[Number].longValue()` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 opened a new pull request #28596: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"
Ngone51 opened a new pull request #28596: URL: https://github.com/apache/spark/pull/28596 ### What changes were proposed in this pull request? Change from `messages.toList.iterator` to `Iterator.single(messages.toList)`. ### Why are the changes needed? In this test, the expected result of `rdd2.collect().head` should actually be `List("0", "1", "2", "3")` but is `"0"` now. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated test. Thanks @WeichenXu123 reported this problem. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28583: [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier
AmplabJenkins removed a comment on pull request #28583: URL: https://github.com/apache/spark/pull/28583#issuecomment-631948059 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28583: [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier
AmplabJenkins commented on pull request #28583: URL: https://github.com/apache/spark/pull/28583#issuecomment-631948059 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28583: [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier
SparkQA commented on pull request #28583: URL: https://github.com/apache/spark/pull/28583#issuecomment-631947436 **[Test build #122919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122919/testReport)** for PR 28583 at commit [`5851842`](https://github.com/apache/spark/commit/5851842120f59d118bcbf60a2003f1097a28b203). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #28583: [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier
sarutak commented on pull request #28583: URL: https://github.com/apache/spark/pull/28583#issuecomment-631944356 Test seems to be aborted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #28583: [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier
sarutak commented on pull request #28583: URL: https://github.com/apache/spark/pull/28583#issuecomment-63194 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on a change in pull request #28525: [SPARK-27562][Shuffle] Complete the verification mechanism for shuffle transmitted data
turboFei commented on a change in pull request #28525: URL: https://github.com/apache/spark/pull/28525#discussion_r428495266 ## File path: common/network-common/src/main/java/org/apache/spark/network/util/DigestUtils.java ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.util; + +import java.io.*; +import java.nio.ByteBuffer; +import java.nio.MappedByteBuffer; +import java.nio.channels.FileChannel; +import java.util.zip.CRC32; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class DigestUtils { +private static final Logger LOG = LoggerFactory.getLogger(DigestUtils.class); +private static final int STREAM_BUFFER_LENGTH = 8192; +private static final int DIGEST_LENGTH = 8; + +public static int getDigestLength() { + return DIGEST_LENGTH; +} + +public static long getDigest(InputStream data) throws IOException { +CRC32 crc32 = new CRC32(); +byte[] buffer = new byte[STREAM_BUFFER_LENGTH]; +int len; +while ((len = data.read(buffer)) >= 0) { +crc32.update(buffer, 0, len); +} +return crc32.getValue(); +} + +public static long getDigest(File file, long offset, long length) { +if (length <= 0) { +return -1L; +} +try (RandomAccessFile rf = new RandomAccessFile(file, "r")) { +MappedByteBuffer data = rf.getChannel().map(FileChannel.MapMode.READ_ONLY, offset, + length); +CRC32 crc32 = new CRC32(); +byte[] buffer = new byte[STREAM_BUFFER_LENGTH]; +int len; +while ((len = Math.min(STREAM_BUFFER_LENGTH, data.remaining())) > 0) { +data.get(buffer, 0, len); +crc32.update(buffer, 0, len); +} +return crc32.getValue(); +} catch (IOException e) { +LOG.error(String.format("Exception while computing digest for file segment: " + + "%s(offset:%d, length:%d)", file.getName(), offset, length )); +return -1L; +} +} +} Review comment: thx This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions
cloud-fan commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631935225 LGTM except https://github.com/apache/spark/pull/28534/files#r427952417 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions
AmplabJenkins removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631931016 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions
AmplabJenkins commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631931016 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on a change in pull request #28525: [SPARK-27562][Shuffle] Complete the verification mechanism for shuffle transmitted data
turboFei commented on a change in pull request #28525: URL: https://github.com/apache/spark/pull/28525#discussion_r428490170 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -626,16 +628,61 @@ final class ShuffleBlockFetcherIterator( buf.release() throwFetchFailedException(blockId, mapIndex, address, e) } + + // If shuffle digest enabled is true, check the block with checkSum. + var failedOnDigestCheck = false + if (digestEnabled) { +if (digest >= 0) { + val digestToCheck = try { +DigestUtils.getDigest(in) + } catch { +case e: IOException => + logError("Error occurs when checking digest", e) + buf.release() + throwFetchFailedException(blockId, mapIndex, address, e) + } + failedOnDigestCheck = digest != digestToCheck + if (!failedOnDigestCheck) { Review comment: thx This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions
SparkQA commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631930528 **[Test build #122918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122918/testReport)** for PR 28534 at commit [`e4db9a5`](https://github.com/apache/spark/commit/e4db9a5b0b8def467a61f310040db1221d7c206e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
HyukjinKwon commented on a change in pull request #28593: URL: https://github.com/apache/spark/pull/28593#discussion_r428488834 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2586,6 +2586,22 @@ object SQLConf { .checkValue(_ > 0, "The timeout value must be positive") .createWithDefault(10L) + val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE = +buildConf("spark.sql.legacy.numericConvertToTimestampEnable") + .doc("when true,use legacy numberic can convert to timestamp") + .version("3.0.0") + .booleanConf + .createWithDefault(false) + + val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_IN_SECONDS = +buildConf("spark.sql.legacy.numericConvertToTimestampInSeconds") + .internal() + .doc("The legacy only works when LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE is true." + +"when true,the value will be interpreted as seconds,which follow spark style," + +"when false,value is interpreted as milliseconds,which follow hive style") Review comment: This kind of compatibility isn't fully guaranteed in Spark, see also [Compatibility with Apache Hive](https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html). Simply following Hive doesn't justify this PR. There are already a bunch of mismatched behaviours and I don't like to target more compatibility, in particular, by fixing the basic functionalities such as cast and adding such switches to maintain. Why is it difficult to just add `ts / 1000`? The workaround seems very easy. If we target to get rid of `cast(ts as long)` away, adding separate functions is a-okay because it doesn't affect existing users, and also looks other DBMSs have their own ways by having such functions in general. Looks we will have workarounds once these functions from https://github.com/apache/spark/pull/28534 are merged, and seems you can leverage these functions alternatively as well. I would say a-no to fix a basic functionality to have another variant and non-standard behaviour, which could potentially trigger to have another set of non-standard behaviours in many other places in Spark. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
AmplabJenkins commented on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631924505 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
AmplabJenkins removed a comment on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631924505 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
SparkQA commented on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631923918 **[Test build #122917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122917/testReport)** for PR 28556 at commit [`b77a1ba`](https://github.com/apache/spark/commit/b77a1ba23f5c39fb541bb8f61ebbbd62ffbb975d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
AmplabJenkins removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631922879 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122912/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
AmplabJenkins removed a comment on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631922741 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122916/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
AmplabJenkins removed a comment on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631922905 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122914/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
viirya commented on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631923383 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
SparkQA removed a comment on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631892298 **[Test build #122914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122914/testReport)** for PR 28595 at commit [`2d6e87b`](https://github.com/apache/spark/commit/2d6e87b1892683b6035bfdf6ed8c475bada5bbc8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
AmplabJenkins removed a comment on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631922732 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
SparkQA removed a comment on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631915185 **[Test build #122916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122916/testReport)** for PR 28556 at commit [`b77a1ba`](https://github.com/apache/spark/commit/b77a1ba23f5c39fb541bb8f61ebbbd62ffbb975d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
AmplabJenkins removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631922744 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
AmplabJenkins removed a comment on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631922899 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
AmplabJenkins commented on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631922732 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
AmplabJenkins commented on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631922899 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
SparkQA removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631892304 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
AmplabJenkins commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631922744 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
SparkQA commented on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631922678 **[Test build #122914 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122914/testReport)** for PR 28595 at commit [`2d6e87b`](https://github.com/apache/spark/commit/2d6e87b1892683b6035bfdf6ed8c475bada5bbc8). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
AmplabJenkins commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631922809 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
SparkQA commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631922675 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
SparkQA commented on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631922680 **[Test build #122916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122916/testReport)** for PR 28556 at commit [`b77a1ba`](https://github.com/apache/spark/commit/b77a1ba23f5c39fb541bb8f61ebbbd62ffbb975d). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] GuoPhilipse commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
GuoPhilipse commented on a change in pull request #28593: URL: https://github.com/apache/spark/pull/28593#discussion_r428481765 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2586,6 +2586,22 @@ object SQLConf { .checkValue(_ > 0, "The timeout value must be positive") .createWithDefault(10L) + val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE = +buildConf("spark.sql.legacy.numericConvertToTimestampEnable") + .doc("when true,use legacy numberic can convert to timestamp") + .version("3.0.0") + .booleanConf + .createWithDefault(false) + + val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_IN_SECONDS = +buildConf("spark.sql.legacy.numericConvertToTimestampInSeconds") + .internal() + .doc("The legacy only works when LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE is true." + +"when true,the value will be interpreted as seconds,which follow spark style," + +"when false,value is interpreted as milliseconds,which follow hive style") Review comment: Hi HyukjinKwon thanks for reviewing,we discussed the pain point when we move to spark in #28568, i mean we can adopt both the compatibility flag and adding functions, for using the function, user need to modify tasks one by one with the casting compatibility flag turning off,unfortunally ,we have almost hundred thousand tasks migrating from hive to spark, so with a flag ,we will first fail the tasks if it had CAST_NUMERIC_TO_TIMESTAMP,if user really want to use, we will suggest the NEW adding three functions for them,maybe it's a good way to avoid the case when the task has been succeed,while the casting result is wrong,which is more serious,maybe other brothers meet the same headache problem,so i hope we will embracing spark better with this patch, This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
TJX2014 commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428478287 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MillisToTimestamp]("timestamp_milliseconds"), Review comment: Ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
AmplabJenkins removed a comment on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631915587 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
AmplabJenkins commented on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631915587 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
cloud-fan commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r428474929 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInEquiJoin.scala ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, Project, UnaryNode} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * Wraps `LogicalRelation` to provide the number of buckets for coalescing. + */ +case class CoalesceBuckets( +numCoalescedBuckets: Int, +child: LogicalRelation) extends UnaryNode { + require(numCoalescedBuckets > 0, +s"Number of coalesced buckets ($numCoalescedBuckets) must be positive.") + + override def output: Seq[Attribute] = child.output +} + +/** + * This rule adds a `CoalesceBuckets` logical plan if the following conditions are met: + * - Two bucketed tables are joined. + * - Join is the equi-join. + * - The larger bucket number is divisible by the smaller bucket number. + * - "spark.sql.bucketing.coalesceBucketsInJoin.enabled" is set to true. + * - The difference in the number of buckets is less than the value set in + * "spark.sql.bucketing.coalesceBucketsInJoin.maxNumBucketsDiff". + */ +object CoalesceBucketsInEquiJoin extends Rule[LogicalPlan] { + private def isPlanEligible(plan: LogicalPlan): Boolean = { +def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = { + p(plan) && plan.children.forall(forall(_)(p)) +} + +forall(plan) { + case _: Filter | _: Project | _: LogicalRelation => true + case _ => false +} + } + + private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = { +if (isPlanEligible(plan)) { + plan.collectFirst { +case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) if r.bucketSpec.nonEmpty => + r.bucketSpec.get + } +} else { + None +} + } + + private def mayCoalesce(numBuckets1: Int, numBuckets2: Int, conf: SQLConf): Option[Int] = { +assert(numBuckets1 != numBuckets2) +val (small, large) = (math.min(numBuckets1, numBuckets2), math.max(numBuckets1, numBuckets2)) +// A bucket can be coalesced only if the bigger number of buckets is divisible by the smaller +// number of buckets because bucket id is calculated by modding the total number of buckets. +if ((large % small == 0) && ((large - small) <= conf.coalesceBucketsInJoinMaxNumBucketsDiff)) { + Some(small) +} else { + None +} + } + + private def addCoalesceBuckets(plan: LogicalPlan, numCoalescedBuckets: Int): LogicalPlan = { +plan.transformUp { + case l @ LogicalRelation(_: HadoopFsRelation, _, _, _) => +CoalesceBuckets(numCoalescedBuckets, l) +} + } + + object ExtractJoinWithBuckets { +def unapply(plan: LogicalPlan): Option[(Join, Int, Int)] = { + plan match { +case join @ ExtractEquiJoinKeys(_, _, _, _, left, right, _) => Review comment: yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
HyukjinKwon commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428474898 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MillisToTimestamp]("timestamp_milliseconds"), Review comment: Let's also don't forget to update the PR title and description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
SparkQA commented on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631915185 **[Test build #122916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122916/testReport)** for PR 28556 at commit [`b77a1ba`](https://github.com/apache/spark/commit/b77a1ba23f5c39fb541bb8f61ebbbd62ffbb975d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
viirya commented on a change in pull request #28556: URL: https://github.com/apache/spark/pull/28556#discussion_r428473708 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -34,7 +34,8 @@ object NestedColumnAliasing { : Option[(Map[ExtractValue, Alias], Map[ExprId, Seq[Alias]])] = plan match { case Project(projectList, child) if SQLConf.get.nestedSchemaPruningEnabled && canProjectPushThrough(child) => - getAliasSubMap(projectList) + val exprsToPrune = projectList ++ child.expressions Review comment: changed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
viirya commented on a change in pull request #28556: URL: https://github.com/apache/spark/pull/28556#discussion_r428473655 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -204,15 +211,8 @@ object GeneratorNestedColumnAliasing { g: Generate, nestedFieldToAlias: Map[ExtractValue, Alias], attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = { -val newGenerator = g.generator.transform { - case f: ExtractValue if nestedFieldToAlias.contains(f) => -nestedFieldToAlias(f).toAttribute -}.asInstanceOf[Generator] - // Defer updating `Generate.unrequiredChildIndex` to next round of `ColumnPruning`. -val newGenerate = g.copy(generator = newGenerator) - -NestedColumnAliasing.replaceChildrenWithAliases(newGenerate, attrToAliases) +NestedColumnAliasing.replaceChildrenWithAliases(g, nestedFieldToAlias, attrToAliases) Review comment: Changed the method name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] GuoPhilipse commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
GuoPhilipse commented on a change in pull request #28593: URL: https://github.com/apache/spark/pull/28593#discussion_r428473216 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -1277,7 +1285,11 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit val block = inline"new java.math.BigDecimal($MICROS_PER_SECOND)" code"($d.toBigDecimal().bigDecimal().multiply($block)).longValue()" } - private[this] def longToTimeStampCode(l: ExprValue): Block = code"$l * (long)$MICROS_PER_SECOND" + private[this] def longToTimeStampCode(l: ExprValue): Block = { +if (SQLConf.get.numericConvertToTimestampInSeconds) code"" + Review comment: yes,let me correct it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
cloud-fan commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428472274 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MillisToTimestamp]("timestamp_milliseconds"), Review comment: `timestamp_millisecond` -> `timestamp_millis` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
cloud-fan commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428472315 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MillisToTimestamp]("timestamp_milliseconds"), +expression[MicrosToTimestamp]("timestamp_microseconds"), Review comment: ditto This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
AmplabJenkins removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631909869 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
AmplabJenkins commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631909869 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
SparkQA commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631909451 **[Test build #122915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122915/testReport)** for PR 28534 at commit [`97263e8`](https://github.com/apache/spark/commit/97263e84877bb767e476cfde635e5fb991dc4d6e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org