[GitHub] [spark] SparkQA commented on issue #25230: [SPARK-28471][SQL] Replace `yyyy` by `uuuu` in date-timestamp patterns without era
SparkQA commented on issue #25230: [SPARK-28471][SQL] Replace `` by `` in date-timestamp patterns without era URL: https://github.com/apache/spark/pull/25230#issuecomment-514079553 **[Test build #108037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108037/testReport)** for PR 25230 at commit [`cdfe56d`](https://github.com/apache/spark/commit/cdfe56dbc59b6d9b3dfb32b39c60266e3cc9f1d6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25230: [SPARK-28471][SQL] Replace `yyyy` by `uuuu` in date-timestamp patterns without era
AmplabJenkins removed a comment on issue #25230: [SPARK-28471][SQL] Replace `` by `` in date-timestamp patterns without era URL: https://github.com/apache/spark/pull/25230#issuecomment-514079027 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25230: [SPARK-28471][SQL] Replace `yyyy` by `uuuu` in date-timestamp patterns without era
AmplabJenkins removed a comment on issue #25230: [SPARK-28471][SQL] Replace `` by `` in date-timestamp patterns without era URL: https://github.com/apache/spark/pull/25230#issuecomment-514079031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13145/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25230: [SPARK-28471][SQL] Replace `yyyy` by `uuuu` in date-timestamp patterns without era
AmplabJenkins commented on issue #25230: [SPARK-28471][SQL] Replace `` by `` in date-timestamp patterns without era URL: https://github.com/apache/spark/pull/25230#issuecomment-514079027 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25230: [SPARK-28471][SQL] Replace `yyyy` by `uuuu` in date-timestamp patterns without era
AmplabJenkins commented on issue #25230: [SPARK-28471][SQL] Replace `` by `` in date-timestamp patterns without era URL: https://github.com/apache/spark/pull/25230#issuecomment-514079031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13145/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks when executors are lost
gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks when executors are lost URL: https://github.com/apache/spark/pull/24462#issuecomment-514078853 @squito I met with a condition that cannot be satisfied without this PR: - On map side, all shuffle files are written to remote Hadoop filesystem, there isn't any shuffle files managed by `BlockManager`s. So I simply should make `ShuffleWriters` return `MapStatus.location == null`? No, because it cannot fulfil the need during shuffle write. - On reduce side, I want to read the shuffle index files from the cache on the executors who wrote them, so I need the `BlockManagerId` in the `MapStatus` to tell each reducer which executor to find. This is what I mentioned: > what Driver decides to do when invalidating an executor(what this PR works on) and how the Executors tell Driver the MapStatus(with or without a location) are two different things. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink
SparkQA commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink URL: https://github.com/apache/spark/pull/25232#issuecomment-514077480 **[Test build #108036 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108036/testReport)** for PR 25232 at commit [`4314fa7`](https://github.com/apache/spark/commit/4314fa7f2a5688e1a918393a241d6bd8607d88f0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink
AmplabJenkins removed a comment on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink URL: https://github.com/apache/spark/pull/25232#issuecomment-514076907 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink
AmplabJenkins removed a comment on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink URL: https://github.com/apache/spark/pull/25232#issuecomment-514076912 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13144/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink
AmplabJenkins commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink URL: https://github.com/apache/spark/pull/25232#issuecomment-514076912 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13144/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink
AmplabJenkins commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink URL: https://github.com/apache/spark/pull/25232#issuecomment-514076907 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjinleekr commented on a change in pull request #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming
dongjinleekr commented on a change in pull request #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming URL: https://github.com/apache/spark/pull/22282#discussion_r306151018 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java ## @@ -30,9 +30,10 @@ import com.esotericsoftware.kryo.KryoSerializable; import com.esotericsoftware.kryo.io.Input; import com.esotericsoftware.kryo.io.Output; - import org.apache.spark.sql.catalyst.InternalRow; -import org.apache.spark.sql.types.*; +import org.apache.spark.sql.types.DataType; Review comment: Sure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] nkarpov commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink
nkarpov commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink URL: https://github.com/apache/spark/pull/25232#issuecomment-514075684 Thanks @HyukjinKwon - in that case it's worth to just add the tests. Added in latest commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r306147894 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -39,27 +39,36 @@ object StringUtils extends Logging { * throw an [[AnalysisException]]. * * @param pattern the SQL pattern to convert + * @param escapeStr the escape string contains one character. * @return the equivalent Java regular expression of the pattern */ - def escapeLikeRegex(pattern: String): String = { + def escapeLikeRegex(pattern: String, escapeStr: String): String = { +val escapeChar = escapeStr.charAt(0) val in = pattern.toIterator val out = new StringBuilder() def fail(message: String) = throw new AnalysisException( s"the pattern '$pattern' is invalid, $message") while (in.hasNext) { - in.next match { -case '\\' if in.hasNext => + val cur = in.next + if (cur == escapeChar) { Review comment: OK. I think your suggestion is better. I learned it and to have a try! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file
AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file URL: https://github.com/apache/spark/pull/25134#issuecomment-514072146 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file
SparkQA removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file URL: https://github.com/apache/spark/pull/25134#issuecomment-514041249 **[Test build #108030 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108030/testReport)** for PR 25134 at commit [`feb8dd0`](https://github.com/apache/spark/commit/feb8dd0c9489c8d9a6b0dd6f4243081510cafda6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file
AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file URL: https://github.com/apache/spark/pull/25134#issuecomment-514072155 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108030/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file
AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file URL: https://github.com/apache/spark/pull/25134#issuecomment-514072155 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108030/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file
AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file URL: https://github.com/apache/spark/pull/25134#issuecomment-514072146 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file
SparkQA commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file URL: https://github.com/apache/spark/pull/25134#issuecomment-514071654 **[Test build #108030 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108030/testReport)** for PR 25134 at commit [`feb8dd0`](https://github.com/apache/spark/commit/feb8dd0c9489c8d9a6b0dd6f4243081510cafda6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514064747 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108034/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514064740 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
SparkQA removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514052849 **[Test build #108034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108034/testReport)** for PR 25178 at commit [`99dfe7e`](https://github.com/apache/spark/commit/99dfe7e5639f30143fe8e164d80377c583cf4b33). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514064740 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514064747 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108034/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
SparkQA commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514064520 **[Test build #108034 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108034/testReport)** for PR 25178 at commit [`99dfe7e`](https://github.com/apache/spark/commit/99dfe7e5639f30143fe8e164d80377c583cf4b33). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base
AmplabJenkins removed a comment on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base URL: https://github.com/apache/spark/pull/25233#issuecomment-514062730 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base
AmplabJenkins commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base URL: https://github.com/apache/spark/pull/25233#issuecomment-514063864 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base
AmplabJenkins removed a comment on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base URL: https://github.com/apache/spark/pull/25233#issuecomment-514062306 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Udbhav30 commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base
Udbhav30 commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base URL: https://github.com/apache/spark/pull/25233#issuecomment-514062796 Hi @HyukjinKwon can you please review this thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base
AmplabJenkins commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base URL: https://github.com/apache/spark/pull/25233#issuecomment-514062730 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base
AmplabJenkins commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base URL: https://github.com/apache/spark/pull/25233#issuecomment-514062306 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Udbhav30 opened a new pull request #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base
Udbhav30 opened a new pull request #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base URL: https://github.com/apache/spark/pull/25233 ## What changes were proposed in this pull request? This PR adds some tests converted from 'pgSQL/select_implicit.sql' to test UDFs Diff comparing to 'pgSQL/select_implicit.sql' ```diff ... diff --git a/sql/core/src/test/resources/sql-tests/results/pgSQL/select_implicit.sql.out b/sql/core/src/test/resources/sql-tests/results/udf/pgSQL/udf-select_implicit.sql.out index 0675820..e6a5995 100755 --- a/sql/core/src/test/resources/sql-tests/results/pgSQL/select_implicit.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/udf/pgSQL/udf-select_implicit.sql.out @@ -91,9 +91,11 @@ struct<> -- !query 11 -SELECT c, count(*) FROM test_missing_target GROUP BY test_missing_target.c ORDER BY c +SELECT udf(c), udf(count(*)) FROM test_missing_target GROUP BY +test_missing_target.c +ORDER BY udf(c) -- !query 11 schema -struct +struct -- !query 11 output ABAB2 2 @@ -104,9 +106,10 @@ 2 -- !query 12 -SELECT count(*) FROM test_missing_target GROUP BY test_missing_target.c ORDER BY c +SELECT udf(count(*)) FROM test_missing_target GROUP BY test_missing_target.c +ORDER BY udf(c) -- !query 12 schema -struct +struct -- !query 12 output 2 2 @@ -117,18 +120,18 @@ struct -- !query 13 -SELECT count(*) FROM test_missing_target GROUP BY a ORDER BY b +SELECT udf(count(*)) FROM test_missing_target GROUP BY a ORDER BY udf(b) -- !query 13 schema struct<> -- !query 13 output org.apache.spark.sql.AnalysisException -cannot resolve '`b`' given input columns: [count(1)]; line 1 pos 61 +cannot resolve '`b`' given input columns: [CAST(udf(cast(count(1) as string)) AS BIGINT)]; line 1 pos 70 -- !query 14 -SELECT count(*) FROM test_missing_target GROUP BY b ORDER BY b +SELECT udf(count(*)) FROM test_missing_target GROUP BY b ORDER BY udf(b) -- !query 14 schema -struct +struct -- !query 14 output 1 2 @@ -137,10 +140,10 @@ struct -- !query 15 -SELECT test_missing_target.b, count(*) - FROM test_missing_target GROUP BY b ORDER BY b +SELECT udf(test_missing_target.b), udf(count(*)) + FROM test_missing_target GROUP BY b ORDER BY udf(b) -- !query 15 schema -struct +struct -- !query 15 output 1 1 2 2 @@ -149,9 +152,9 @@ struct -- !query 16 -SELECT c FROM test_missing_target ORDER BY a +SELECT udf(c) FROM test_missing_target ORDER BY udf(a) -- !query 16 schema -struct +struct -- !query 16 output ABAB @@ -166,9 +169,9 @@ -- !query 17 -SELECT count(*) FROM test_missing_target GROUP BY b ORDER BY b desc +SELECT udf(count(*)) FROM test_missing_target GROUP BY b ORDER BY udf(b) desc -- !query 17 schema -struct +struct -- !query 17 output 4 3 @@ -177,17 +180,17 @@ struct -- !query 18 -SELECT count(*) FROM test_missing_target ORDER BY 1 desc +SELECT udf(count(*)) FROM test_missing_target ORDER BY udf(1) desc -- !query 18 schema -struct +struct -- !query 18 output 10 -- !query 19 -SELECT c, count(*) FROM test_missing_target GROUP BY 1 ORDER BY 1 +SELECT udf(c), udf(count(*)) FROM test_missing_target GROUP BY 1 ORDER BY 1 -- !query 19 schema -struct +struct -- !query 19 output ABAB2 2 @@ -198,18 +201,18 @@ 2 -- !query 20 -SELECT c, count(*) FROM test_missing_target GROUP BY 3 +SELECT udf(c), udf(count(*)) FROM test_missing_target GROUP BY 3 -- !query 20 schema struct<> -- !query 20 output org.apache.spark.sql.AnalysisException -GROUP BY position 3 is not in select list (valid range is [1, 2]); line 1 pos 53 +GROUP BY position 3 is not in select list (valid range is [1, 2]); line 1 pos 63 -- !query 21 -SELECT count(*) FROM test_missing_target x, test_missing_target y -WHERE x.a = y.a -GROUP BY b ORDER BY b +SELECT udf(count(*)) FROM test_missing_target x, test_missing_target y +WHERE udf(x.a) = udf(y.a) +GROUP BY b ORDER BY udf(b) -- !query 21 schema struct<> -- !query 21 output @@ -218,10 +221,10 @@ Reference 'b' is ambiguous, could be: x.b, y.b.; line 3 pos 10 -- !query 22 -SELECT a, a FROM test_missing_target -ORDER BY a +SELECT udf(a), udf(a) FROM test_missing_target +ORDER BY udf(a) -- !query 22 schema -struct +struct -- !query 22 output 0 0 1 1 @@ -236,10 +239,10 @@ struct -- !query 23 -SELECT a/2, a/2 FROM test_missing_target -ORDER BY a/2 +SELECT udf(udf(a
[GitHub] [spark] SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514058402 **[Test build #108035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108035/testReport)** for PR 25001 at commit [`2b4c59a`](https://github.com/apache/spark/commit/2b4c59a48d6bc87d286902ad6d1281b78c329f3f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] chenjunjiedada edited a comment on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
chenjunjiedada edited a comment on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514057779 @rvesse @vanzin In summary, if the user specifies the directory that cannot be found in current volume (which volume name prefix is `local-dir-`), build an emptyDir volume, else use the existing volume. Is this acceptable? @erikerlandson , is this way also acceptable for you? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514058033 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13143/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514058030 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514058033 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13143/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514058030 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] chenjunjiedada commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
chenjunjiedada commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514057779 @rvesse @vanzin In summary, if the user specifies the directory that cannot be found in current volume (which volume name prefix is `local-dir-`), build an emptyDir volume, else use the existing volume. Is this acceptable? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file
AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file URL: https://github.com/apache/spark/pull/25134#issuecomment-514056453 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108029/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file
AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file URL: https://github.com/apache/spark/pull/25134#issuecomment-514056445 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file
AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file URL: https://github.com/apache/spark/pull/25134#issuecomment-514056453 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108029/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file
AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file URL: https://github.com/apache/spark/pull/25134#issuecomment-514056445 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file
SparkQA removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file URL: https://github.com/apache/spark/pull/25134#issuecomment-514028148 **[Test build #108029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108029/testReport)** for PR 25134 at commit [`e6cf714`](https://github.com/apache/spark/commit/e6cf714137e3d86ae7041136f88403c8745a7cd8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file
SparkQA commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file URL: https://github.com/apache/spark/pull/25134#issuecomment-514056107 **[Test build #108029 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108029/testReport)** for PR 25134 at commit [`e6cf714`](https://github.com/apache/spark/commit/e6cf714137e3d86ae7041136f88403c8745a7cd8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #25135: [SPARK-28367][SS] Use new KafkaConsumer.poll API in Kafka connector
HeartSaVioR edited a comment on issue #25135: [SPARK-28367][SS] Use new KafkaConsumer.poll API in Kafka connector URL: https://github.com/apache/spark/pull/25135#issuecomment-514048478 Here's a part of test code Kafka has been doing with new poll. https://github.com/apache/kafka/blob/f98e176746d663fadedbcd3c18312a7f476a20c8/core/src/test/scala/integration/kafka/api/PlaintextConsumerTest.scala#L1748-L1752 ``` private def awaitAssignment(consumer: Consumer[_, _], expectedAssignment: Set[TopicPartition]): Unit = { TestUtils.pollUntilTrue(consumer, () => consumer.assignment() == expectedAssignment.asJava, s"Timed out while awaiting expected assignment $expectedAssignment. " + s"The current assignment is ${consumer.assignment()}") } ``` https://github.com/apache/kafka/blob/f98e176746d663fadedbcd3c18312a7f476a20c8/core/src/test/scala/unit/kafka/utils/TestUtils.scala#L767-L775 ``` def pollUntilTrue(consumer: Consumer[_, _], action: () => Boolean, msg: => String, waitTimeMs: Long = JTestUtils.DEFAULT_MAX_WAIT_MS): Unit = { waitUntilTrue(() => { consumer.poll(Duration.ofMillis(50)) action() }, msg = msg, pause = 0L, waitTimeMs = waitTimeMs) } ``` Kafka has still some parts of test code relying on deprecated `poll(0)` (so co-usage on both `poll(Duration)` and `poll(long)`). It might not be technical reason to do so, but they're still relying on old favor, which might mean they indicate the needs of usage on `poll(0)`. Sometimes Kafka calls `updateAssignmentMetadataIfNeeded` directly which deals with metadata update in `poll()` with max long timer, effectively blocking. The method is for testing: defined as package private. ``` consumer.updateAssignmentMetadataIfNeeded(time.timer(Long.MAX_VALUE)); ``` In many cases of calling `poll(Duration.ZERO)` in test code, `updateAssignmentMetadataIfNeeded` is called prior. In other cases the verification codes just seem to confirm calling poll doing nothing or returning already fetched records. I guess in our case we need to either leverage `updateAssignmentMetadataIfNeeded` to only deal with metadata (it may require some hack and they clarified it's for testing so this is not the one for us), or `poll` with small timeout (50ms) with tolerating the case where record to pull is not available (incorporated in latency regardless of availability of metadata). Btw, I'm seeing KIP-288 which proposed new public API `waitForAssignment` similar to `updateAssignmentMetadataIfNeeded` but it was discarded since KIP-266 superseded KIP-288, and KIP-266 didn't finally add it. Not sure it is declined or just missed it. https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514053805 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514053811 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13142/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514053811 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13142/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514053805 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514041986 The expected cost without range check is `E(cost(apply2)) = log(NNZ)`; while the one with range check is `E(cost(apply3)) = 2 + P(key in range)*log(NNZ)`; The diff is `E(cost(apply3) - cost(apply2)) = 2 - P(key out of range) * log(NNZ)`, so the optimization is high related to the key distribution and the `NNZ`. ~~The above suite suppose the input key is from an uniform distribution. And show that, if the `NNZ` is small, range check will cost extra 10% cost; otherwise, the range check will save about 50% cost.~~ previous test suite uses `val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.sorted.take(nnz)` to generate indices, which is biased. I just change it to `val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.take(nnz).sorted`. Now the version without range check is faster, since `P(key out of range)` in most case should be a probability near 0%. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface
SparkQA commented on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface URL: https://github.com/apache/spark/pull/25189#issuecomment-514052850 **[Test build #108033 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108033/testReport)** for PR 25189 at commit [`ca111a4`](https://github.com/apache/spark/commit/ca111a4150155900f70d36381432ca3191bca07f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
SparkQA commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514052849 **[Test build #108034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108034/testReport)** for PR 25178 at commit [`99dfe7e`](https://github.com/apache/spark/commit/99dfe7e5639f30143fe8e164d80377c583cf4b33). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514041986 The expected cost without range check is `E(cost(apply2)) = log(NNZ)`; while the one with range check is `E(cost(apply3)) = 2 + P(key in range)*log(NNZ)`; The diff is `E(cost(apply3) - cost(apply2)) = 2 - P(key out of range) * log(NNZ)`, so the optimization is high related to the key distribution and the `NNZ`. ~~The above suite suppose the input key is from an uniform distribution. And show that, if the `NNZ` is small, range check will cost extra 10% cost; otherwise, the range check will save about 50% cost.~~ previous test suite uses `val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.sorted.take(nnz)` to generate indices, which is biased. I just change it to `val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.take(nnz).sorted`. Now the version without range check is faster, since `P(key out of range)` in most case should be a probability near 100%. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface
AmplabJenkins removed a comment on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface URL: https://github.com/apache/spark/pull/25189#issuecomment-514052465 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface
AmplabJenkins removed a comment on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface URL: https://github.com/apache/spark/pull/25189#issuecomment-514052470 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13141/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface
AmplabJenkins commented on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface URL: https://github.com/apache/spark/pull/25189#issuecomment-514052470 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13141/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface
AmplabJenkins commented on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface URL: https://github.com/apache/spark/pull/25189#issuecomment-514052465 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514051070 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108032/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514051063 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514051063 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514051070 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108032/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514051057 **[Test build #108032 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108032/testReport)** for PR 25001 at commit [`420f1b9`](https://github.com/apache/spark/commit/420f1b9123a606868980b76f9aed1b7035f33e30). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
SparkQA removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514050378 **[Test build #108032 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108032/testReport)** for PR 25001 at commit [`420f1b9`](https://github.com/apache/spark/commit/420f1b9123a606868980b76f9aed1b7035f33e30). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514041986 The expected cost without range check is `E(cost(apply2)) = log(NNZ)`; while the one with range check is `E(cost(apply3)) = 2 + P(key in range)*log(NNZ)`; The diff is `E(cost(apply3) - cost(apply2)) = 2 - P(key out of range) * log(NNZ)`, so the optimization is high related to the key distribution and the `NNZ`. ~~The above suite suppose the input key is from an uniform distribution. And show that, if the `NNZ` is small, range check will cost extra 10% cost; otherwise, the range check will save about 50% cost.~~ previous test suite uses `val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.sorted.take(nnz)` to generate indices, which is biased. I just convert it to `val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.take(nnz).sorted`. Now the version without range check is faster, since `P(key out of range)` in most case should be a probability near 100%. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514050378 **[Test build #108032 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108032/testReport)** for PR 25001 at commit [`420f1b9`](https://github.com/apache/spark/commit/420f1b9123a606868980b76f9aed1b7035f33e30). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514035619 I test the perf among current impl (`apply`) , direct binary-search (`apply2`), binary-seach with extra range check (`apply3`) ``` def apply2(i: Int): Double = { if (i < 0 || i >= size) { throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, $size)") } val j = util.Arrays.binarySearch(indices, i) if (j < 0) 0.0 else values(j) } def apply3(i: Int): Double = { if (i < 0 || i >= size) { throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, $size)") } if (indices.isEmpty || i < indices(0) || i > indices(indices.length - 1)) { 0.0 } else { val j = util.Arrays.binarySearch(indices, i) if (j < 0) 0.0 else values(j) } } ``` the test suite is similar with the above one ``` import scala.util.Random import org.apache.spark.ml.linalg._ val size = 1000 for (nnz <- Seq(100, 1, 100)) { val rng = new Random(123) val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % size).distinct.take(nnz).sorted val values = Array.fill(nnz)(rng.nextDouble) val vec = Vectors.sparse(size, indices, values).toSparse val tic1 = System.currentTimeMillis; (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec(i); i+=1} }; val toc1 = System.currentTimeMillis; val tic2 = System.currentTimeMillis; (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec.apply2(i); i+=1} }; val toc2 = System.currentTimeMillis; val tic3 = System.currentTimeMillis; (0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < size) {sum+=vec.apply3(i); i+=1} }; val toc3 = System.currentTimeMillis; println((size, nnz, toc1 - tic1, toc2 - tic2, toc3 - tic3)) } ``` | size| nnz | apply(old) | apply2 | apply3| |--|--||--|--| |1000|100|75294|12208|18682| |1000|1|75616|23132|32932| |1000|100|92949|42529|48821| So the version without range check is faster, I will update the pr. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514050061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13140/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514050061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13140/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514050057 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array
maropu commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array URL: https://github.com/apache/spark/pull/25172#discussion_r306124653 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ## @@ -472,6 +472,19 @@ object Overlay { builder.append(input.substringSQL(pos + length, Int.MaxValue)) builder.build() } + + def calculate(input: Array[Byte], replace: Array[Byte], pos: Int, len: Int): Array[Byte] = { +// If you specify length, it must be a positive whole number or zero. +// Otherwise it will be ignored. +// The default value for length is the length of replace. +val length = if (len >= 0) { + len +} else { + replace.length +} +ByteArray.concat(ByteArray.subStringSQL(input, 1, pos - 1), + replace, ByteArray.subStringSQL(input, pos + length, Int.MaxValue)) + } Review comment: cc: @viirya @mgaido91 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-514050057 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array
beliefer commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array URL: https://github.com/apache/spark/pull/25172#discussion_r306124403 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ## @@ -472,6 +472,19 @@ object Overlay { builder.append(input.substringSQL(pos + length, Int.MaxValue)) builder.build() } + + def calculate(input: Array[Byte], replace: Array[Byte], pos: Int, len: Int): Array[Byte] = { +// If you specify length, it must be a positive whole number or zero. +// Otherwise it will be ignored. +// The default value for length is the length of replace. +val length = if (len >= 0) { + len +} else { + replace.length +} +ByteArray.concat(ByteArray.subStringSQL(input, 1, pos - 1), + replace, ByteArray.subStringSQL(input, pos + length, Int.MaxValue)) + } Review comment: I am not very good at using generated code. If this is a strong suggestion, I will try to use generated code. But I think this helper function is OK too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514041986 The expected cost without range check is `E(cost(apply2)) = log(NNZ)`; while the one with range check is `E(cost(apply3)) = 2 + P(key in range)*log(NNZ)`; The diff is `E(cost(apply3) - cost(apply2)) = 2 - P(key out of range) * log(NNZ)`, so the optimization is high related to the key distribution and the `NNZ`. ~~The above suite suppose the input key is from an uniform distribution. And show that, if the `NNZ` is small, range check will cost extra 10% cost; otherwise, the range check will save about 50% cost.~~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514041986 The expected cost without range check is `E(cost(apply2)) = log(NNZ)`; while the one with range check is `E(cost(apply3)) = 2 + P(key in range)*log(NNZ)`; The diff is `E(cost(apply3) - cost(apply2)) = 2 - P(key out of range) * log(NNZ)`, so the optimization is high related to the key distribution and the `NNZ`. ***The above suite suppose the input key is from an uniform distribution. And show that, if the `NNZ` is small, range check will cost extra 10% cost; otherwise, the range check will save about 50% cost.*** This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array
maropu commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array URL: https://github.com/apache/spark/pull/25172#discussion_r306124119 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ## @@ -496,19 +509,38 @@ case class Overlay(input: Expression, replace: Expression, pos: Expression, len: this(str, replace, pos, Literal.create(-1, IntegerType)) } - override def dataType: DataType = StringType + override def dataType: DataType = input.dataType - override def inputTypes: Seq[AbstractDataType] = -Seq(StringType, StringType, IntegerType, IntegerType) + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(StringType, BinaryType), +TypeCollection(StringType, BinaryType), IntegerType, IntegerType) override def children: Seq[Expression] = input :: replace :: pos :: len :: Nil + override def checkInputDataTypes(): TypeCheckResult = { Review comment: I think this is an issue about implicit casts, not function arguments. In the example above, the left argument (text) is casted as binary then `overlay(binary, binary)` is called actually. So, my question is that we need to extend `ImplicitCastInputTypes` for overlay? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #25135: [SPARK-28367][SS] Use new KafkaConsumer.poll API in Kafka connector
HeartSaVioR edited a comment on issue #25135: [SPARK-28367][SS] Use new KafkaConsumer.poll API in Kafka connector URL: https://github.com/apache/spark/pull/25135#issuecomment-514048478 Here's a part of test code Kafka has been doing with new poll. https://github.com/apache/kafka/blob/f98e176746d663fadedbcd3c18312a7f476a20c8/core/src/test/scala/integration/kafka/api/PlaintextConsumerTest.scala#L1748-L1752 ``` private def awaitAssignment(consumer: Consumer[_, _], expectedAssignment: Set[TopicPartition]): Unit = { TestUtils.pollUntilTrue(consumer, () => consumer.assignment() == expectedAssignment.asJava, s"Timed out while awaiting expected assignment $expectedAssignment. " + s"The current assignment is ${consumer.assignment()}") } ``` https://github.com/apache/kafka/blob/f98e176746d663fadedbcd3c18312a7f476a20c8/core/src/test/scala/unit/kafka/utils/TestUtils.scala#L767-L775 ``` def pollUntilTrue(consumer: Consumer[_, _], action: () => Boolean, msg: => String, waitTimeMs: Long = JTestUtils.DEFAULT_MAX_WAIT_MS): Unit = { waitUntilTrue(() => { consumer.poll(Duration.ofMillis(50)) action() }, msg = msg, pause = 0L, waitTimeMs = waitTimeMs) } ``` Kafka has still some parts of test code relying on deprecated `poll(0)` (so co-usage on both `poll(Duration)` and `poll(long)`). It might not be technical reason to do so, but they're still relying on old favor, which might mean they indicate the needs of usage on `poll(0)`. Sometimes Kafka calls `updateAssignmentMetadataIfNeeded` directly which deals with metadata update in `poll()` with max long timer, effectively blocking. The method is for testing: defined as package private. ``` consumer.updateAssignmentMetadataIfNeeded(time.timer(Long.MAX_VALUE)); ``` In many cases of calling `poll(Duration.ZERO)` in test code, `updateAssignmentMetadataIfNeeded` is called prior. In other cases the verification codes just seem to confirm calling poll doing nothing or returning already fetched records. I guess in our case we need to either leverage `updateAssignmentMetadataIfNeeded` to only deal with metadata (it may require some hack and they clarified it's for testing so unsafe one), or `poll` with small timeout (50ms) with tolerating the case where record to pull is not available (incorporated in latency regardless of availability of metadata). Btw, I'm seeing KIP-288 to propose new public API `waitForAssignment` similar to `updateAssignmentMetadataIfNeeded` but KIP-288 was discarded since KIP-266 superseded KIP-288, and KIP-266 didn't finally add it. Not sure it is declined or just missed it. https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #25135: [SPARK-28367][SS] Use new KafkaConsumer.poll API in Kafka connector
HeartSaVioR commented on issue #25135: [SPARK-28367][SS] Use new KafkaConsumer.poll API in Kafka connector URL: https://github.com/apache/spark/pull/25135#issuecomment-514048478 Here's a part of test code Kafka has been doing with new poll. https://github.com/apache/kafka/blob/f98e176746d663fadedbcd3c18312a7f476a20c8/core/src/test/scala/integration/kafka/api/PlaintextConsumerTest.scala#L1748-L1752 ``` private def awaitAssignment(consumer: Consumer[_, _], expectedAssignment: Set[TopicPartition]): Unit = { TestUtils.pollUntilTrue(consumer, () => consumer.assignment() == expectedAssignment.asJava, s"Timed out while awaiting expected assignment $expectedAssignment. " + s"The current assignment is ${consumer.assignment()}") } ``` https://github.com/apache/kafka/blob/f98e176746d663fadedbcd3c18312a7f476a20c8/core/src/test/scala/unit/kafka/utils/TestUtils.scala#L767-L775 ``` def pollUntilTrue(consumer: Consumer[_, _], action: () => Boolean, msg: => String, waitTimeMs: Long = JTestUtils.DEFAULT_MAX_WAIT_MS): Unit = { waitUntilTrue(() => { consumer.poll(Duration.ofMillis(50)) action() }, msg = msg, pause = 0L, waitTimeMs = waitTimeMs) } ``` Kafka has still some parts of test code relying on deprecated `poll(0)` (so co-usage on both `poll(Duration)` and `poll(long)`). It might not be technical reason to do so, but they're still relying on old favor, which might mean they indicate the needs of usage on `poll(0)`. Sometimes Kafka calls `updateAssignmentMetadataIfNeeded` directly which deals with metadata update in `poll()` with max long timer, effectively blocking. The method is for testing: defined as package private. ``` consumer.updateAssignmentMetadataIfNeeded(time.timer(Long.MAX_VALUE)); ``` In many cases of calling `poll(Duration.ZERO)` in test code, `updateAssignmentMetadataIfNeeded` is called prior to the call. In other cases the verification codes just seem to confirm calling poll doing nothing or returning already fetched records. I guess in our case we need to either leverage `updateAssignmentMetadataIfNeeded` to only deal with metadata (it may require some hack and they clarified it's for testing so unsafe one), or `poll` with small timeout (50ms) with tolerating the case where record to pull is not available (incorporated in latency regardless of availability of metadata). Btw, I'm seeing KIP-288 to propose new public API `waitForAssignment` similar to `updateAssignmentMetadataIfNeeded` but KIP-288 was discarded since KIP-266 superseded KIP-288, and KIP-266 didn't finally add it. Not sure it is declined or just missed it. https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array
beliefer commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array URL: https://github.com/apache/spark/pull/25172#discussion_r306122925 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ## @@ -496,19 +509,38 @@ case class Overlay(input: Expression, replace: Expression, pos: Expression, len: this(str, replace, pos, Literal.create(-1, IntegerType)) } - override def dataType: DataType = StringType + override def dataType: DataType = input.dataType - override def inputTypes: Seq[AbstractDataType] = -Seq(StringType, StringType, IntegerType, IntegerType) + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(StringType, BinaryType), +TypeCollection(StringType, BinaryType), IntegerType, IntegerType) override def children: Seq[Expression] = input :: replace :: pos :: len :: Nil + override def checkInputDataTypes(): TypeCheckResult = { Review comment: ``` ::= OVERLAY PLACING FROM [ FOR ] [ USING ] ``` and ``` ::= OVERLAY PLACING FROM [ FOR ] ``` I can't find any other case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array
beliefer commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array URL: https://github.com/apache/spark/pull/25172#discussion_r306122505 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ## @@ -496,19 +509,38 @@ case class Overlay(input: Expression, replace: Expression, pos: Expression, len: this(str, replace, pos, Literal.create(-1, IntegerType)) } - override def dataType: DataType = StringType + override def dataType: DataType = input.dataType - override def inputTypes: Seq[AbstractDataType] = -Seq(StringType, StringType, IntegerType, IntegerType) + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(StringType, BinaryType), +TypeCollection(StringType, BinaryType), IntegerType, IntegerType) override def children: Seq[Expression] = input :: replace :: pos :: len :: Nil + override def checkInputDataTypes(): TypeCheckResult = { Review comment: According ANSI SQL, only (binary, binary) or (string, string) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
HyukjinKwon commented on a change in pull request #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#discussion_r306121633 ## File path: python/pyspark/tests/test_context.py ## @@ -273,7 +273,8 @@ def setUp(self): self.tempFile.close() os.chmod(self.tempFile.name, stat.S_IRWXU | stat.S_IXGRP | stat.S_IRGRP | stat.S_IROTH | stat.S_IXOTH) -conf = SparkConf().set("spark.driver.resource.gpu.amount", "1") +conf = SparkConf().set("spark.test.home", SPARK_HOME) Review comment: This is because `spark.test.home` property is only set in testing mode in JVM. I left a comment https://github.com/apache/spark/pull/25047/files/95111b0b9c0a4732041d19b5e221eecea4408147#r306121558 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
HyukjinKwon commented on a change in pull request #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#discussion_r306121558 ## File path: core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala ## @@ -70,6 +93,276 @@ private[spark] object ResourceUtils extends Logging { // internally we currently only support addresses, so its just an integer count val AMOUNT = "amount" + /** + * Assign resources to workers/drivers from the same host to avoid address conflict. + * + * This function works in three steps. First, acquiring the lock on RESOURCES_LOCK_FILE + * to achieve synchronization among workers and drivers. Second, getting all allocated + * resources from ALLOCATED_RESOURCES_FILE and assigning isolated resources to the worker + * or driver after differentiating available resources in discovered resources from + * allocated resources. If available resources don't meet worker's or driver's requirement, + * try to update allocated resources by excluding the resource allocation if its related + * process has already terminated and do the assignment again. If still don't meet requirement, + * exception would be threw. Third, updating ALLOCATED_RESOURCES_FILE with new allocated + * resources along with pid for the worker or driver. Then, return allocated resources + * information after releasing the lock. + * + * @param conf SparkConf + * @param componentName spark.driver / spark.worker + * @param resources the resources found by worker/driver on the host + * @param resourceRequirements the resource requirements asked by the worker/driver + * @param pid the process id of worker/driver to acquire resources. + * @return allocated resources for the worker/driver or throws exception if can't + * meet worker/driver's requirement + */ + def acquireResources( + conf: SparkConf, + componentName: String, + resources: Map[String, ResourceInformation], + resourceRequirements: Seq[ResourceRequirement], + pid: Int) +: Map[String, ResourceInformation] = { +if (resourceRequirements.isEmpty) { + return Map.empty +} +val lock = acquireLock(conf) +val resourcesFile = new File(getOrCreateResourcesDir(conf), ALLOCATED_RESOURCES_FILE) +// all allocated resources in ALLOCATED_RESOURCES_FILE, can be updated if any allocations' +// related processes detected to be terminated while checking pids below. +var origAllocation = Seq.empty[StandaloneResourceAllocation] +var allocated = { + if (resourcesFile.exists()) { +origAllocation = allocatedStandaloneResources(resourcesFile.getPath) +val allocations = origAllocation.map { resource => + val resourceMap = { +resource.allocations.map { allocation => + allocation.id.resourceName -> allocation.addresses.toArray +}.toMap + } + resource.pid -> resourceMap +}.toMap +allocations + } else { +Map.empty[Int, Map[String, Array[String]]] + } +} + +// new allocated resources for worker or driver, +// map from resource name to its allocated addresses. +var newAssignments: Map[String, Array[String]] = null +// Whether we've checked process status and we'll only do the check at most once. +// Do the check iff the available resources can't meet the requirements at the first time. +var checked = false +// Whether we need to keep allocating for the worker/driver and we'll only go through +// the loop at most twice. +var keepAllocating = true +while (keepAllocating) { + keepAllocating = false + // store the pid whose related allocated resources conflict with + // discovered resources passed in. + val pidsToCheck = mutable.Set[Int]() + newAssignments = resourceRequirements.map { req => +val rName = req.resourceName +val amount = req.amount +// initially, we must have available.length >= amount as we've done pre-check previously +var available = resources(rName).addresses +// gets available resource addresses by excluding all +// allocated resource addresses from discovered resources +allocated.foreach { a => + val thePid = a._1 + val resourceMap = a._2 + val assigned = resourceMap.getOrElse(rName, Array.empty) + val retained = available.diff(assigned) + if (retained.length < available.length && !checked) { +pidsToCheck += thePid + } + if (retained.length >= amount) { +available = retained + } else if (checked) { +keepAllocating = false +releaseLock(lock) +throw new SparkException(s"No more resources available since they've already" + + s" assigned to other workers/driv
[GitHub] [spark] AmplabJenkins removed a comment on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
AmplabJenkins removed a comment on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF) URL: https://github.com/apache/spark/pull/24958#issuecomment-514045154 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
AmplabJenkins removed a comment on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF) URL: https://github.com/apache/spark/pull/24958#issuecomment-514045159 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108027/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
AmplabJenkins commented on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF) URL: https://github.com/apache/spark/pull/24958#issuecomment-514045159 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108027/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
AmplabJenkins commented on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF) URL: https://github.com/apache/spark/pull/24958#issuecomment-514045154 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
SparkQA removed a comment on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF) URL: https://github.com/apache/spark/pull/24958#issuecomment-514014636 **[Test build #108027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108027/testReport)** for PR 24958 at commit [`cb4cfde`](https://github.com/apache/spark/commit/cb4cfdee34b02d3be7372eeb972c7908b63469d5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
SparkQA commented on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF) URL: https://github.com/apache/spark/pull/24958#issuecomment-514044796 **[Test build #108027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108027/testReport)** for PR 24958 at commit [`cb4cfde`](https://github.com/apache/spark/commit/cb4cfdee34b02d3be7372eeb972c7908b63469d5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25225: [SPARK-28469][SQL] Change CalendarIntervalType's readable string representation from calendarinterval to interval
HyukjinKwon commented on issue #25225: [SPARK-28469][SQL] Change CalendarIntervalType's readable string representation from calendarinterval to interval URL: https://github.com/apache/spark/pull/25225#issuecomment-514044823 +1 too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #25225: [SPARK-28469][SQL] Change CalendarIntervalType's readable string representation from calendarinterval to interval
dongjoon-hyun closed pull request #25225: [SPARK-28469][SQL] Change CalendarIntervalType's readable string representation from calendarinterval to interval URL: https://github.com/apache/spark/pull/25225 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface
dongjoon-hyun commented on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface URL: https://github.com/apache/spark/pull/25189#issuecomment-514044307 #25225 is merged now. Could you rebase this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25225: [SPARK-28469][SQL] Change CalendarIntervalType's readable string representation from calendarinterval to interval
dongjoon-hyun commented on issue #25225: [SPARK-28469][SQL] Change CalendarIntervalType's readable string representation from calendarinterval to interval URL: https://github.com/apache/spark/pull/25225#issuecomment-514043510 Yep. I think it's okay by itself. Thank you for the pointer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base
SparkQA commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base URL: https://github.com/apache/spark/pull/25161#issuecomment-514042578 **[Test build #108031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108031/testReport)** for PR 25161 at commit [`8795d66`](https://github.com/apache/spark/commit/8795d66b189712f54a55a3b0663273fb26126a8e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base
AmplabJenkins removed a comment on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base URL: https://github.com/apache/spark/pull/25161#issuecomment-514042256 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13139/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base
AmplabJenkins removed a comment on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base URL: https://github.com/apache/spark/pull/25161#issuecomment-514042253 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base
AmplabJenkins commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base URL: https://github.com/apache/spark/pull/25161#issuecomment-514042253 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base
AmplabJenkins commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base URL: https://github.com/apache/spark/pull/25161#issuecomment-514042256 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13139/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization
zhengruifeng commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization URL: https://github.com/apache/spark/pull/25178#issuecomment-514041986 The expected cost without range check is `E(cost(apply2)) = log(NNZ)`; while the one with range check is `E(cost(apply3)) = 2 + P(key in range)*log(NNZ)`; The diff is `E(cost(apply3) - cost(apply2)) = 2 - P(key out of range) * log(NNZ)`, so the optimization is high related to the key distribution and the `NNZ`. The above suite suppose the input key is from an uniform distribution. And show that, if the `NNZ` is small, range check will cost extra 10% cost; otherwise, the range check will save about 50% cost. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shivusondur commented on a change in pull request #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base
shivusondur commented on a change in pull request #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base URL: https://github.com/apache/spark/pull/25161#discussion_r306117675 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/pgSQL/udf-select_having.sql ## @@ -0,0 +1,58 @@ +-- +-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group +-- +-- +-- SELECT_HAVING +-- https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/select_having.sql +-- +-- This test file was converted from inputs/pgSQL/select_having.sql Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org