[GitHub] [spark] SparkQA commented on issue #25230: [SPARK-28471][SQL] Replace `yyyy` by `uuuu` in date-timestamp patterns without era

2019-07-22 Thread GitBox
SparkQA commented on issue #25230: [SPARK-28471][SQL] Replace `` by `` 
in date-timestamp patterns without era
URL: https://github.com/apache/spark/pull/25230#issuecomment-514079553
 
 
   **[Test build #108037 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108037/testReport)**
 for PR 25230 at commit 
[`cdfe56d`](https://github.com/apache/spark/commit/cdfe56dbc59b6d9b3dfb32b39c60266e3cc9f1d6).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25230: [SPARK-28471][SQL] Replace `yyyy` by `uuuu` in date-timestamp patterns without era

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25230: [SPARK-28471][SQL] Replace 
`` by `` in date-timestamp patterns without era
URL: https://github.com/apache/spark/pull/25230#issuecomment-514079027
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25230: [SPARK-28471][SQL] Replace `yyyy` by `uuuu` in date-timestamp patterns without era

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25230: [SPARK-28471][SQL] Replace 
`` by `` in date-timestamp patterns without era
URL: https://github.com/apache/spark/pull/25230#issuecomment-514079031
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13145/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25230: [SPARK-28471][SQL] Replace `yyyy` by `uuuu` in date-timestamp patterns without era

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25230: [SPARK-28471][SQL] Replace `` by 
`` in date-timestamp patterns without era
URL: https://github.com/apache/spark/pull/25230#issuecomment-514079027
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25230: [SPARK-28471][SQL] Replace `yyyy` by `uuuu` in date-timestamp patterns without era

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25230: [SPARK-28471][SQL] Replace `` by 
`` in date-timestamp patterns without era
URL: https://github.com/apache/spark/pull/25230#issuecomment-514079031
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13145/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks when executors are lost

2019-07-22 Thread GitBox
gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks 
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-514078853
 
 
   @squito I met with a condition that cannot be satisfied without this PR:
   - On map side, all shuffle files are written to remote Hadoop filesystem, 
there isn't any shuffle files managed by `BlockManager`s. So I simply should 
make `ShuffleWriters` return `MapStatus.location == null`? No, because it 
cannot fulfil the need during shuffle write.
   - On reduce side, I want to read the shuffle index files from the cache on 
the executors who wrote them, so I need the `BlockManagerId` in the `MapStatus` 
to tell each reducer which executor to find.
   
   This is what I mentioned: 
   
   > what Driver decides to do when invalidating an executor(what this PR works 
on) and how the Executors tell Driver the MapStatus(with or without a location) 
are two different things.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink

2019-07-22 Thread GitBox
SparkQA commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter 
to GraphiteSink
URL: https://github.com/apache/spark/pull/25232#issuecomment-514077480
 
 
   **[Test build #108036 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108036/testReport)**
 for PR 25232 at commit 
[`4314fa7`](https://github.com/apache/spark/commit/4314fa7f2a5688e1a918393a241d6bd8607d88f0).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25232: [SPARK-28475][CORE] Add regex 
MetricFilter to GraphiteSink
URL: https://github.com/apache/spark/pull/25232#issuecomment-514076907
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25232: [SPARK-28475][CORE] Add regex 
MetricFilter to GraphiteSink
URL: https://github.com/apache/spark/pull/25232#issuecomment-514076912
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13144/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25232: [SPARK-28475][CORE] Add regex 
MetricFilter to GraphiteSink
URL: https://github.com/apache/spark/pull/25232#issuecomment-514076912
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13144/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25232: [SPARK-28475][CORE] Add regex 
MetricFilter to GraphiteSink
URL: https://github.com/apache/spark/pull/25232#issuecomment-514076907
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjinleekr commented on a change in pull request #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming

2019-07-22 Thread GitBox
dongjinleekr commented on a change in pull request #22282: [SPARK-23539][SS] 
Add support for Kafka headers in Structured Streaming
URL: https://github.com/apache/spark/pull/22282#discussion_r306151018
 
 

 ##
 File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ##
 @@ -30,9 +30,10 @@
 import com.esotericsoftware.kryo.KryoSerializable;
 import com.esotericsoftware.kryo.io.Input;
 import com.esotericsoftware.kryo.io.Output;
-
 import org.apache.spark.sql.catalyst.InternalRow;
-import org.apache.spark.sql.types.*;
+import org.apache.spark.sql.types.DataType;
 
 Review comment:
   Sure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] nkarpov commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter to GraphiteSink

2019-07-22 Thread GitBox
nkarpov commented on issue #25232: [SPARK-28475][CORE] Add regex MetricFilter 
to GraphiteSink
URL: https://github.com/apache/spark/pull/25232#issuecomment-514075684
 
 
   Thanks @HyukjinKwon - in that case it's worth to just add the tests. Added 
in latest commit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] 
Support LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#discussion_r306147894
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ##
 @@ -39,27 +39,36 @@ object StringUtils extends Logging {
* throw an [[AnalysisException]].
*
* @param pattern the SQL pattern to convert
+   * @param escapeStr the escape string contains one character.
* @return the equivalent Java regular expression of the pattern
*/
-  def escapeLikeRegex(pattern: String): String = {
+  def escapeLikeRegex(pattern: String, escapeStr: String): String = {
+val escapeChar = escapeStr.charAt(0)
 val in = pattern.toIterator
 val out = new StringBuilder()
 
 def fail(message: String) = throw new AnalysisException(
   s"the pattern '$pattern' is invalid, $message")
 
 while (in.hasNext) {
-  in.next match {
-case '\\' if in.hasNext =>
+  val cur = in.next
+  if (cur == escapeChar) {
 
 Review comment:
   OK. I think your suggestion is better. I learned it and to have a try!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver 
when loading single large unsplittable file
URL: https://github.com/apache/spark/pull/25134#issuecomment-514072146
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file

2019-07-22 Thread GitBox
SparkQA removed a comment on issue #25134: [SPARK-28366][CORE] Logging in 
driver when loading single large unsplittable file
URL: https://github.com/apache/spark/pull/25134#issuecomment-514041249
 
 
   **[Test build #108030 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108030/testReport)**
 for PR 25134 at commit 
[`feb8dd0`](https://github.com/apache/spark/commit/feb8dd0c9489c8d9a6b0dd6f4243081510cafda6).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in 
driver when loading single large unsplittable file
URL: https://github.com/apache/spark/pull/25134#issuecomment-514072155
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108030/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver 
when loading single large unsplittable file
URL: https://github.com/apache/spark/pull/25134#issuecomment-514072155
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108030/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in 
driver when loading single large unsplittable file
URL: https://github.com/apache/spark/pull/25134#issuecomment-514072146
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file

2019-07-22 Thread GitBox
SparkQA commented on issue #25134: [SPARK-28366][CORE] Logging in driver when 
loading single large unsplittable file
URL: https://github.com/apache/spark/pull/25134#issuecomment-514071654
 
 
   **[Test build #108030 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108030/testReport)**
 for PR 25134 at commit 
[`feb8dd0`](https://github.com/apache/spark/commit/feb8dd0c9489c8d9a6b0dd6f4243081510cafda6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] 
SparseVector.apply performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514064747
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108034/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] 
SparseVector.apply performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514064740
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
SparkQA removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply 
performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514052849
 
 
   **[Test build #108034 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108034/testReport)**
 for PR 25178 at commit 
[`99dfe7e`](https://github.com/apache/spark/commit/99dfe7e5639f30143fe8e164d80377c583cf4b33).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply 
performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514064740
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply 
performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514064747
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108034/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
SparkQA commented on issue #25178: [SPARK-28421][ML] SparseVector.apply 
performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514064520
 
 
   **[Test build #108034 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108034/testReport)**
 for PR 25178 at commit 
[`99dfe7e`](https://github.com/apache/spark/commit/99dfe7e5639f30143fe8e164d80377c583cf4b33).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25233: 
[SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' 
into UDF test base
URL: https://github.com/apache/spark/pull/25233#issuecomment-514062730
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] 
Convert and port 'pgSQL/select_implicit.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25233#issuecomment-514063864
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25233: 
[SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' 
into UDF test base
URL: https://github.com/apache/spark/pull/25233#issuecomment-514062306
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Udbhav30 commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base

2019-07-22 Thread GitBox
Udbhav30 commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert 
and port 'pgSQL/select_implicit.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25233#issuecomment-514062796
 
 
   Hi @HyukjinKwon can you please review this
   thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] 
Convert and port 'pgSQL/select_implicit.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25233#issuecomment-514062730
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25233: [SPARK-28391][SQL][PYTHON][TESTS] 
Convert and port 'pgSQL/select_implicit.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25233#issuecomment-514062306
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Udbhav30 opened a new pull request #25233: [SPARK-28391][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_implicit.sql' into UDF test base

2019-07-22 Thread GitBox
Udbhav30 opened a new pull request #25233: [SPARK-28391][SQL][PYTHON][TESTS] 
Convert and port 'pgSQL/select_implicit.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25233
 
 
   ## What changes were proposed in this pull request?
   This PR adds some tests converted from 'pgSQL/select_implicit.sql' to test 
UDFs
   Diff comparing to 'pgSQL/select_implicit.sql'
   
   
   ```diff
   ... diff --git 
a/sql/core/src/test/resources/sql-tests/results/pgSQL/select_implicit.sql.out 
b/sql/core/src/test/resources/sql-tests/results/udf/pgSQL/udf-select_implicit.sql.out
   index 0675820..e6a5995 100755
   --- 
a/sql/core/src/test/resources/sql-tests/results/pgSQL/select_implicit.sql.out
   +++ 
b/sql/core/src/test/resources/sql-tests/results/udf/pgSQL/udf-select_implicit.sql.out
   @@ -91,9 +91,11 @@ struct<>


-- !query 11
   -SELECT c, count(*) FROM test_missing_target GROUP BY test_missing_target.c 
ORDER BY c
   +SELECT udf(c), udf(count(*)) FROM test_missing_target GROUP BY
   +test_missing_target.c
   +ORDER BY udf(c)
-- !query 11 schema
   -struct
   +struct
-- !query 11 output
ABAB2
2
   @@ -104,9 +106,10 @@ 2


-- !query 12
   -SELECT count(*) FROM test_missing_target GROUP BY test_missing_target.c 
ORDER BY c
   +SELECT udf(count(*)) FROM test_missing_target GROUP BY test_missing_target.c
   +ORDER BY udf(c)
-- !query 12 schema
   -struct
   +struct
-- !query 12 output
2
2
   @@ -117,18 +120,18 @@ struct


-- !query 13
   -SELECT count(*) FROM test_missing_target GROUP BY a ORDER BY b
   +SELECT udf(count(*)) FROM test_missing_target GROUP BY a ORDER BY udf(b)
-- !query 13 schema
struct<>
-- !query 13 output
org.apache.spark.sql.AnalysisException
   -cannot resolve '`b`' given input columns: [count(1)]; line 1 pos 61
   +cannot resolve '`b`' given input columns: [CAST(udf(cast(count(1) as 
string)) AS BIGINT)]; line 1 pos 70


-- !query 14
   -SELECT count(*) FROM test_missing_target GROUP BY b ORDER BY b
   +SELECT udf(count(*)) FROM test_missing_target GROUP BY b ORDER BY udf(b)
-- !query 14 schema
   -struct
   +struct
-- !query 14 output
1
2
   @@ -137,10 +140,10 @@ struct


-- !query 15
   -SELECT test_missing_target.b, count(*)
   -  FROM test_missing_target GROUP BY b ORDER BY b
   +SELECT udf(test_missing_target.b), udf(count(*))
   +  FROM test_missing_target GROUP BY b ORDER BY udf(b)
-- !query 15 schema
   -struct
   +struct
-- !query 15 output
1   1
2   2
   @@ -149,9 +152,9 @@ struct


-- !query 16
   -SELECT c FROM test_missing_target ORDER BY a
   +SELECT udf(c) FROM test_missing_target ORDER BY udf(a)
-- !query 16 schema
   -struct
   +struct
-- !query 16 output

ABAB
   @@ -166,9 +169,9 @@ 


-- !query 17
   -SELECT count(*) FROM test_missing_target GROUP BY b ORDER BY b desc
   +SELECT udf(count(*)) FROM test_missing_target GROUP BY b ORDER BY udf(b) 
desc
-- !query 17 schema
   -struct
   +struct
-- !query 17 output
4
3
   @@ -177,17 +180,17 @@ struct


-- !query 18
   -SELECT count(*) FROM test_missing_target ORDER BY 1 desc
   +SELECT udf(count(*)) FROM test_missing_target ORDER BY udf(1) desc
-- !query 18 schema
   -struct
   +struct
-- !query 18 output
10


-- !query 19
   -SELECT c, count(*) FROM test_missing_target GROUP BY 1 ORDER BY 1
   +SELECT udf(c), udf(count(*)) FROM test_missing_target GROUP BY 1 ORDER BY 1
-- !query 19 schema
   -struct
   +struct
-- !query 19 output
ABAB2
2
   @@ -198,18 +201,18 @@    2


-- !query 20
   -SELECT c, count(*) FROM test_missing_target GROUP BY 3
   +SELECT udf(c), udf(count(*)) FROM test_missing_target GROUP BY 3
-- !query 20 schema
struct<>
-- !query 20 output
org.apache.spark.sql.AnalysisException
   -GROUP BY position 3 is not in select list (valid range is [1, 2]); line 1 
pos 53
   +GROUP BY position 3 is not in select list (valid range is [1, 2]); line 1 
pos 63


-- !query 21
   -SELECT count(*) FROM test_missing_target x, test_missing_target y
   -WHERE x.a = y.a
   -GROUP BY b ORDER BY b
   +SELECT udf(count(*)) FROM test_missing_target x, test_missing_target y
   +WHERE udf(x.a) = udf(y.a)
   +GROUP BY b ORDER BY udf(b)
-- !query 21 schema
struct<>
-- !query 21 output
   @@ -218,10 +221,10 @@ Reference 'b' is ambiguous, could be: x.b, y.b.; line 
3 pos 10


-- !query 22
   -SELECT a, a FROM test_missing_target
   -ORDER BY a
   +SELECT udf(a), udf(a) FROM test_missing_target
   +ORDER BY udf(a)
-- !query 22 schema
   -struct
   +struct
-- !query 22 output
0   0
1   1
   @@ -236,10 +239,10 @@ struct


-- !query 23
   -SELECT a/2, a/2 FROM test_missing_target
   -ORDER BY a/2
   +SELECT udf(udf(a

[GitHub] [spark] SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE 
syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514058402
 
 
   **[Test build #108035 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108035/testReport)**
 for PR 25001 at commit 
[`2b4c59a`](https://github.com/apache/spark/commit/2b4c59a48d6bc87d286902ad6d1281b78c329f3f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] chenjunjiedada edited a comment on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage

2019-07-22 Thread GitBox
chenjunjiedada edited a comment on issue #24879: [SPARK-28042][K8S] Support 
using volume mount as local storage
URL: https://github.com/apache/spark/pull/24879#issuecomment-514057779
 
 
   @rvesse @vanzin In summary, if the user specifies the directory that cannot 
be found in current volume (which volume name prefix is `local-dir-`), build an 
emptyDir volume, else use the existing volume. Is this acceptable?
   
   @erikerlandson , is this way also acceptable for you?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support 
LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514058033
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13143/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support 
LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514058030
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... 
ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514058033
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13143/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... 
ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514058030
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] chenjunjiedada commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage

2019-07-22 Thread GitBox
chenjunjiedada commented on issue #24879: [SPARK-28042][K8S] Support using 
volume mount as local storage
URL: https://github.com/apache/spark/pull/24879#issuecomment-514057779
 
 
   @rvesse @vanzin In summary, if the user specifies the directory that cannot 
be found in current volume (which volume name prefix is `local-dir-`), build an 
emptyDir volume, else use the existing volume. Is this acceptable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in 
driver when loading single large unsplittable file
URL: https://github.com/apache/spark/pull/25134#issuecomment-514056453
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108029/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25134: [SPARK-28366][CORE] Logging in 
driver when loading single large unsplittable file
URL: https://github.com/apache/spark/pull/25134#issuecomment-514056445
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver 
when loading single large unsplittable file
URL: https://github.com/apache/spark/pull/25134#issuecomment-514056453
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108029/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25134: [SPARK-28366][CORE] Logging in driver 
when loading single large unsplittable file
URL: https://github.com/apache/spark/pull/25134#issuecomment-514056445
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file

2019-07-22 Thread GitBox
SparkQA removed a comment on issue #25134: [SPARK-28366][CORE] Logging in 
driver when loading single large unsplittable file
URL: https://github.com/apache/spark/pull/25134#issuecomment-514028148
 
 
   **[Test build #108029 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108029/testReport)**
 for PR 25134 at commit 
[`e6cf714`](https://github.com/apache/spark/commit/e6cf714137e3d86ae7041136f88403c8745a7cd8).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file

2019-07-22 Thread GitBox
SparkQA commented on issue #25134: [SPARK-28366][CORE] Logging in driver when 
loading single large unsplittable file
URL: https://github.com/apache/spark/pull/25134#issuecomment-514056107
 
 
   **[Test build #108029 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108029/testReport)**
 for PR 25134 at commit 
[`e6cf714`](https://github.com/apache/spark/commit/e6cf714137e3d86ae7041136f88403c8745a7cd8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on issue #25135: [SPARK-28367][SS] Use new KafkaConsumer.poll API in Kafka connector

2019-07-22 Thread GitBox
HeartSaVioR edited a comment on issue #25135: [SPARK-28367][SS] Use new 
KafkaConsumer.poll API in Kafka connector
URL: https://github.com/apache/spark/pull/25135#issuecomment-514048478
 
 
   Here's a part of test code Kafka has been doing with new poll.
   
   
https://github.com/apache/kafka/blob/f98e176746d663fadedbcd3c18312a7f476a20c8/core/src/test/scala/integration/kafka/api/PlaintextConsumerTest.scala#L1748-L1752
   
   ```
 private def awaitAssignment(consumer: Consumer[_, _], expectedAssignment: 
Set[TopicPartition]): Unit = {
   TestUtils.pollUntilTrue(consumer, () => consumer.assignment() == 
expectedAssignment.asJava,
 s"Timed out while awaiting expected assignment $expectedAssignment. " +
   s"The current assignment is ${consumer.assignment()}")
 }
   ```
   
   
https://github.com/apache/kafka/blob/f98e176746d663fadedbcd3c18312a7f476a20c8/core/src/test/scala/unit/kafka/utils/TestUtils.scala#L767-L775
   
   ```
 def pollUntilTrue(consumer: Consumer[_, _],
   action: () => Boolean,
   msg: => String,
   waitTimeMs: Long = JTestUtils.DEFAULT_MAX_WAIT_MS): Unit 
= {
   waitUntilTrue(() => {
 consumer.poll(Duration.ofMillis(50))
 action()
   }, msg = msg, pause = 0L, waitTimeMs = waitTimeMs)
 }
   ```
   
   Kafka has still some parts of test code relying on deprecated `poll(0)` (so 
co-usage on both `poll(Duration)` and `poll(long)`). It might not be technical 
reason to do so, but they're still relying on old favor, which might mean they 
indicate the needs of usage on `poll(0)`.
   
   Sometimes Kafka calls `updateAssignmentMetadataIfNeeded` directly which 
deals with metadata update in `poll()` with max long timer, effectively 
blocking. The method is for testing: defined as package private.
   
   ```
   consumer.updateAssignmentMetadataIfNeeded(time.timer(Long.MAX_VALUE));
   ```
   
   In many cases of calling `poll(Duration.ZERO)` in test code, 
`updateAssignmentMetadataIfNeeded` is called prior. In other cases the 
verification codes just seem to confirm calling poll doing nothing or returning 
already fetched records.
   
   I guess in our case we need to either leverage 
`updateAssignmentMetadataIfNeeded` to only deal with metadata (it may require 
some hack and they clarified it's for testing so this is not the one for us), 
or `poll` with small timeout (50ms) with tolerating the case where record to 
pull is not available (incorporated in latency regardless of availability of 
metadata).
   
   Btw, I'm seeing KIP-288 which proposed new public API `waitForAssignment` 
similar to `updateAssignmentMetadataIfNeeded` but it was discarded since 
KIP-266 superseded KIP-288, and KIP-266 didn't finally add it. Not sure it is 
declined or just missed it.
   
https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] 
SparseVector.apply performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514053805
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25178: [SPARK-28421][ML] 
SparseVector.apply performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514053811
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13142/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply 
performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514053811
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13142/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25178: [SPARK-28421][ML] SparseVector.apply 
performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514053805
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] 
SparseVector.apply performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514041986
 
 
   The expected cost without range check is `E(cost(apply2)) = log(NNZ)`;
   while the one with range check is `E(cost(apply3)) = 2 + P(key in 
range)*log(NNZ)`;
   The diff is `E(cost(apply3) - cost(apply2)) = 2 - P(key out of range) * 
log(NNZ)`, so the optimization is high related to the key distribution and the 
`NNZ`.
   ~~The above suite suppose the input key is from an uniform distribution. And 
show that, if the `NNZ` is small, range check will cost extra 10% cost; 
otherwise, the range check will save about 50% cost.~~
   
   
   previous test suite uses `val indices = Array.fill(nnz + 
nnz)(rng.nextInt.abs % size).distinct.sorted.take(nnz)` to generate indices, 
which is biased.
   I just change it to  `val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % 
size).distinct.take(nnz).sorted`.
   Now the version without range check is faster, since `P(key out of range)` 
in most case should be a probability near 0%.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface

2019-07-22 Thread GitBox
SparkQA commented on issue #25189: [SPARK-28435][SQL] Support cast StringType 
to IntervalType for SQL interface
URL: https://github.com/apache/spark/pull/25189#issuecomment-514052850
 
 
   **[Test build #108033 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108033/testReport)**
 for PR 25189 at commit 
[`ca111a4`](https://github.com/apache/spark/commit/ca111a4150155900f70d36381432ca3191bca07f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
SparkQA commented on issue #25178: [SPARK-28421][ML] SparseVector.apply 
performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514052849
 
 
   **[Test build #108034 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108034/testReport)**
 for PR 25178 at commit 
[`99dfe7e`](https://github.com/apache/spark/commit/99dfe7e5639f30143fe8e164d80377c583cf4b33).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] 
SparseVector.apply performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514041986
 
 
   The expected cost without range check is `E(cost(apply2)) = log(NNZ)`;
   while the one with range check is `E(cost(apply3)) = 2 + P(key in 
range)*log(NNZ)`;
   The diff is `E(cost(apply3) - cost(apply2)) = 2 - P(key out of range) * 
log(NNZ)`, so the optimization is high related to the key distribution and the 
`NNZ`.
   ~~The above suite suppose the input key is from an uniform distribution. And 
show that, if the `NNZ` is small, range check will cost extra 10% cost; 
otherwise, the range check will save about 50% cost.~~
   
   
   previous test suite uses `val indices = Array.fill(nnz + 
nnz)(rng.nextInt.abs % size).distinct.sorted.take(nnz)` to generate indices, 
which is biased.
   I just change it to  `val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % 
size).distinct.take(nnz).sorted`.
   Now the version without range check is faster, since `P(key out of range)` 
in most case should be a probability near 100%.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25189: [SPARK-28435][SQL] Support 
cast StringType to IntervalType for SQL interface
URL: https://github.com/apache/spark/pull/25189#issuecomment-514052465
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25189: [SPARK-28435][SQL] Support 
cast StringType to IntervalType for SQL interface
URL: https://github.com/apache/spark/pull/25189#issuecomment-514052470
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13141/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25189: [SPARK-28435][SQL] Support cast 
StringType to IntervalType for SQL interface
URL: https://github.com/apache/spark/pull/25189#issuecomment-514052470
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13141/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25189: [SPARK-28435][SQL] Support cast 
StringType to IntervalType for SQL interface
URL: https://github.com/apache/spark/pull/25189#issuecomment-514052465
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support 
LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514051070
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108032/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... 
ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514051063
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support 
LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514051063
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... 
ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514051070
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108032/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE 
syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514051057
 
 
   **[Test build #108032 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108032/testReport)**
 for PR 25001 at commit 
[`420f1b9`](https://github.com/apache/spark/commit/420f1b9123a606868980b76f9aed1b7035f33e30).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
SparkQA removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... 
ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514050378
 
 
   **[Test build #108032 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108032/testReport)**
 for PR 25001 at commit 
[`420f1b9`](https://github.com/apache/spark/commit/420f1b9123a606868980b76f9aed1b7035f33e30).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] 
SparseVector.apply performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514041986
 
 
   The expected cost without range check is `E(cost(apply2)) = log(NNZ)`;
   while the one with range check is `E(cost(apply3)) = 2 + P(key in 
range)*log(NNZ)`;
   The diff is `E(cost(apply3) - cost(apply2)) = 2 - P(key out of range) * 
log(NNZ)`, so the optimization is high related to the key distribution and the 
`NNZ`.
   ~~The above suite suppose the input key is from an uniform distribution. And 
show that, if the `NNZ` is small, range check will cost extra 10% cost; 
otherwise, the range check will save about 50% cost.~~
   
   
   previous test suite uses `val indices = Array.fill(nnz + 
nnz)(rng.nextInt.abs % size).distinct.sorted.take(nnz)` to generate indices, 
which is biased.
   I just convert it to  `val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % 
size).distinct.take(nnz).sorted`.
   Now the version without range check is faster, since `P(key out of range)` 
in most case should be a probability near 100%.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE 
syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514050378
 
 
   **[Test build #108032 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108032/testReport)**
 for PR 25001 at commit 
[`420f1b9`](https://github.com/apache/spark/commit/420f1b9123a606868980b76f9aed1b7035f33e30).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] 
SparseVector.apply performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514035619
 
 
   I test the perf among current impl (`apply`) , direct binary-search 
(`apply2`), binary-seach with extra range check (`apply3`)
   ```
 def apply2(i: Int): Double = {
   if (i < 0 || i >= size) {
 throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, 
$size)")
   }
   
   val j = util.Arrays.binarySearch(indices, i)
   if (j < 0) 0.0 else values(j)
 }
   
 def apply3(i: Int): Double = {
   if (i < 0 || i >= size) {
 throw new IndexOutOfBoundsException(s"Index $i out of bounds [0, 
$size)")
   }
   
   if (indices.isEmpty || i < indices(0) || i > indices(indices.length - 
1)) {
 0.0
   } else {
 val j = util.Arrays.binarySearch(indices, i)
 if (j < 0) 0.0 else values(j)
   }
 }
   ```
   
   the test suite is similar with the above one
   ```
   import scala.util.Random
   import org.apache.spark.ml.linalg._
   
   val size = 1000
   for (nnz <- Seq(100, 1, 100)) {
val rng = new Random(123)
val indices = Array.fill(nnz + nnz)(rng.nextInt.abs % 
size).distinct.take(nnz).sorted
val values = Array.fill(nnz)(rng.nextDouble)
val vec = Vectors.sparse(size, indices, values).toSparse
   
val tic1 = System.currentTimeMillis;
(0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < 
size) {sum+=vec(i); i+=1} };
val toc1 = System.currentTimeMillis;
   
val tic2 = System.currentTimeMillis;
(0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < 
size) {sum+=vec.apply2(i); i+=1} };
val toc2 = System.currentTimeMillis;
   
val tic3 = System.currentTimeMillis;
(0 until 100).foreach{ round => var i = 0; var sum = 0.0; while(i < 
size) {sum+=vec.apply3(i); i+=1} };
val toc3 = System.currentTimeMillis;

println((size, nnz, toc1 - tic1, toc2 - tic2, toc3 - tic3))
   }
   ```
   
   
   
   | size|  nnz | apply(old) | apply2 | apply3|
   |--|--||--|--|
   |1000|100|75294|12208|18682|
   |1000|1|75616|23132|32932|
   |1000|100|92949|42529|48821|
   
   So the version without range check is faster, I will update the pr.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support 
LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514050061
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13140/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... 
ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514050061
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13140/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support 
LIKE ... ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514050057
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array

2019-07-22 Thread GitBox
maropu commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI 
SQL: OVERLAY function support byte array
URL: https://github.com/apache/spark/pull/25172#discussion_r306124653
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ##
 @@ -472,6 +472,19 @@ object Overlay {
 builder.append(input.substringSQL(pos + length, Int.MaxValue))
 builder.build()
   }
+
+  def calculate(input: Array[Byte], replace: Array[Byte], pos: Int, len: Int): 
Array[Byte] = {
+// If you specify length, it must be a positive whole number or zero.
+// Otherwise it will be ignored.
+// The default value for length is the length of replace.
+val length = if (len >= 0) {
+  len
+} else {
+  replace.length
+}
+ByteArray.concat(ByteArray.subStringSQL(input, 1, pos - 1),
+  replace, ByteArray.subStringSQL(input, pos + length, Int.MaxValue))
+  }
 
 Review comment:
   cc: @viirya @mgaido91


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... 
ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-514050057
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array

2019-07-22 Thread GitBox
beliefer commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI 
SQL: OVERLAY function support byte array
URL: https://github.com/apache/spark/pull/25172#discussion_r306124403
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ##
 @@ -472,6 +472,19 @@ object Overlay {
 builder.append(input.substringSQL(pos + length, Int.MaxValue))
 builder.build()
   }
+
+  def calculate(input: Array[Byte], replace: Array[Byte], pos: Int, len: Int): 
Array[Byte] = {
+// If you specify length, it must be a positive whole number or zero.
+// Otherwise it will be ignored.
+// The default value for length is the length of replace.
+val length = if (len >= 0) {
+  len
+} else {
+  replace.length
+}
+ByteArray.concat(ByteArray.subStringSQL(input, 1, pos - 1),
+  replace, ByteArray.subStringSQL(input, pos + length, Int.MaxValue))
+  }
 
 Review comment:
   I am not very good at using generated code. If this is a strong suggestion, 
I will try to use generated code. But I think this helper function is OK too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] 
SparseVector.apply performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514041986
 
 
   The expected cost without range check is `E(cost(apply2)) = log(NNZ)`;
   while the one with range check is `E(cost(apply3)) = 2 + P(key in 
range)*log(NNZ)`;
   The diff is `E(cost(apply3) - cost(apply2)) = 2 - P(key out of range) * 
log(NNZ)`, so the optimization is high related to the key distribution and the 
`NNZ`.
   ~~The above suite suppose the input key is from an uniform distribution. And 
show that, if the `NNZ` is small, range check will cost extra 10% cost; 
otherwise, the range check will save about 50% cost.~~
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
zhengruifeng edited a comment on issue #25178: [SPARK-28421][ML] 
SparseVector.apply performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514041986
 
 
   The expected cost without range check is `E(cost(apply2)) = log(NNZ)`;
   while the one with range check is `E(cost(apply3)) = 2 + P(key in 
range)*log(NNZ)`;
   The diff is `E(cost(apply3) - cost(apply2)) = 2 - P(key out of range) * 
log(NNZ)`, so the optimization is high related to the key distribution and the 
`NNZ`.
   ***The above suite suppose the input key is from an uniform distribution. 
And show that, if the `NNZ` is small, range check will cost extra 10% cost; 
otherwise, the range check will save about 50% cost.***
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array

2019-07-22 Thread GitBox
maropu commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI 
SQL: OVERLAY function support byte array
URL: https://github.com/apache/spark/pull/25172#discussion_r306124119
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ##
 @@ -496,19 +509,38 @@ case class Overlay(input: Expression, replace: 
Expression, pos: Expression, len:
 this(str, replace, pos, Literal.create(-1, IntegerType))
   }
 
-  override def dataType: DataType = StringType
+  override def dataType: DataType = input.dataType
 
-  override def inputTypes: Seq[AbstractDataType] =
-Seq(StringType, StringType, IntegerType, IntegerType)
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(StringType, BinaryType),
+TypeCollection(StringType, BinaryType), IntegerType, IntegerType)
 
   override def children: Seq[Expression] = input :: replace :: pos :: len :: 
Nil
 
+  override def checkInputDataTypes(): TypeCheckResult = {
 
 Review comment:
   I think this is an issue about implicit casts, not function arguments.
   In the example above, the left argument (text) is casted as binary then 
`overlay(binary, binary)` is called actually. So, my question is that we need 
to extend `ImplicitCastInputTypes` for overlay?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on issue #25135: [SPARK-28367][SS] Use new KafkaConsumer.poll API in Kafka connector

2019-07-22 Thread GitBox
HeartSaVioR edited a comment on issue #25135: [SPARK-28367][SS] Use new 
KafkaConsumer.poll API in Kafka connector
URL: https://github.com/apache/spark/pull/25135#issuecomment-514048478
 
 
   Here's a part of test code Kafka has been doing with new poll.
   
   
https://github.com/apache/kafka/blob/f98e176746d663fadedbcd3c18312a7f476a20c8/core/src/test/scala/integration/kafka/api/PlaintextConsumerTest.scala#L1748-L1752
   
   ```
 private def awaitAssignment(consumer: Consumer[_, _], expectedAssignment: 
Set[TopicPartition]): Unit = {
   TestUtils.pollUntilTrue(consumer, () => consumer.assignment() == 
expectedAssignment.asJava,
 s"Timed out while awaiting expected assignment $expectedAssignment. " +
   s"The current assignment is ${consumer.assignment()}")
 }
   ```
   
   
https://github.com/apache/kafka/blob/f98e176746d663fadedbcd3c18312a7f476a20c8/core/src/test/scala/unit/kafka/utils/TestUtils.scala#L767-L775
   
   ```
 def pollUntilTrue(consumer: Consumer[_, _],
   action: () => Boolean,
   msg: => String,
   waitTimeMs: Long = JTestUtils.DEFAULT_MAX_WAIT_MS): Unit 
= {
   waitUntilTrue(() => {
 consumer.poll(Duration.ofMillis(50))
 action()
   }, msg = msg, pause = 0L, waitTimeMs = waitTimeMs)
 }
   ```
   
   Kafka has still some parts of test code relying on deprecated `poll(0)` (so 
co-usage on both `poll(Duration)` and `poll(long)`). It might not be technical 
reason to do so, but they're still relying on old favor, which might mean they 
indicate the needs of usage on `poll(0)`.
   
   Sometimes Kafka calls `updateAssignmentMetadataIfNeeded` directly which 
deals with metadata update in `poll()` with max long timer, effectively 
blocking. The method is for testing: defined as package private.
   
   ```
   consumer.updateAssignmentMetadataIfNeeded(time.timer(Long.MAX_VALUE));
   ```
   
   In many cases of calling `poll(Duration.ZERO)` in test code, 
`updateAssignmentMetadataIfNeeded` is called prior. In other cases the 
verification codes just seem to confirm calling poll doing nothing or returning 
already fetched records.
   
   I guess in our case we need to either leverage 
`updateAssignmentMetadataIfNeeded` to only deal with metadata (it may require 
some hack and they clarified it's for testing so unsafe one), or `poll` with 
small timeout (50ms) with tolerating the case where record to pull is not 
available (incorporated in latency regardless of availability of metadata).
   
   Btw, I'm seeing KIP-288 to propose new public API `waitForAssignment` 
similar to `updateAssignmentMetadataIfNeeded` but KIP-288 was discarded since 
KIP-266 superseded KIP-288, and KIP-266 didn't finally add it. Not sure it is 
declined or just missed it.
   
https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on issue #25135: [SPARK-28367][SS] Use new KafkaConsumer.poll API in Kafka connector

2019-07-22 Thread GitBox
HeartSaVioR commented on issue #25135: [SPARK-28367][SS] Use new 
KafkaConsumer.poll API in Kafka connector
URL: https://github.com/apache/spark/pull/25135#issuecomment-514048478
 
 
   Here's a part of test code Kafka has been doing with new poll.
   
   
https://github.com/apache/kafka/blob/f98e176746d663fadedbcd3c18312a7f476a20c8/core/src/test/scala/integration/kafka/api/PlaintextConsumerTest.scala#L1748-L1752
   
   ```
 private def awaitAssignment(consumer: Consumer[_, _], expectedAssignment: 
Set[TopicPartition]): Unit = {
   TestUtils.pollUntilTrue(consumer, () => consumer.assignment() == 
expectedAssignment.asJava,
 s"Timed out while awaiting expected assignment $expectedAssignment. " +
   s"The current assignment is ${consumer.assignment()}")
 }
   ```
   
   
https://github.com/apache/kafka/blob/f98e176746d663fadedbcd3c18312a7f476a20c8/core/src/test/scala/unit/kafka/utils/TestUtils.scala#L767-L775
   
   ```
 def pollUntilTrue(consumer: Consumer[_, _],
   action: () => Boolean,
   msg: => String,
   waitTimeMs: Long = JTestUtils.DEFAULT_MAX_WAIT_MS): Unit 
= {
   waitUntilTrue(() => {
 consumer.poll(Duration.ofMillis(50))
 action()
   }, msg = msg, pause = 0L, waitTimeMs = waitTimeMs)
 }
   ```
   
   Kafka has still some parts of test code relying on deprecated `poll(0)` (so 
co-usage on both `poll(Duration)` and `poll(long)`). It might not be technical 
reason to do so, but they're still relying on old favor, which might mean they 
indicate the needs of usage on `poll(0)`.
   
   Sometimes Kafka calls `updateAssignmentMetadataIfNeeded` directly which 
deals with metadata update in `poll()` with max long timer, effectively 
blocking. The method is for testing: defined as package private.
   
   ```
   consumer.updateAssignmentMetadataIfNeeded(time.timer(Long.MAX_VALUE));
   ```
   
   In many cases of calling `poll(Duration.ZERO)` in test code, 
`updateAssignmentMetadataIfNeeded` is called prior to the call. In other cases 
the verification codes just seem to confirm calling poll doing nothing or 
returning already fetched records.
   
   I guess in our case we need to either leverage 
`updateAssignmentMetadataIfNeeded` to only deal with metadata (it may require 
some hack and they clarified it's for testing so unsafe one), or `poll` with 
small timeout (50ms) with tolerating the case where record to pull is not 
available (incorporated in latency regardless of availability of metadata).
   
   Btw, I'm seeing KIP-288 to propose new public API `waitForAssignment` 
similar to `updateAssignmentMetadataIfNeeded` but KIP-288 was discarded since 
KIP-266 superseded KIP-288, and KIP-266 didn't finally add it. Not sure it is 
declined or just missed it.
   
https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array

2019-07-22 Thread GitBox
beliefer commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI 
SQL: OVERLAY function support byte array
URL: https://github.com/apache/spark/pull/25172#discussion_r306122925
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ##
 @@ -496,19 +509,38 @@ case class Overlay(input: Expression, replace: 
Expression, pos: Expression, len:
 this(str, replace, pos, Literal.create(-1, IntegerType))
   }
 
-  override def dataType: DataType = StringType
+  override def dataType: DataType = input.dataType
 
-  override def inputTypes: Seq[AbstractDataType] =
-Seq(StringType, StringType, IntegerType, IntegerType)
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(StringType, BinaryType),
+TypeCollection(StringType, BinaryType), IntegerType, IntegerType)
 
   override def children: Seq[Expression] = input :: replace :: pos :: len :: 
Nil
 
+  override def checkInputDataTypes(): TypeCheckResult = {
 
 Review comment:
   ```
::=
   OVERLAY   PLACING 
   FROM  [ FOR  ]
   [ USING  ] 
   ```
   and
   ```
::=
   OVERLAY   PLACING 
   FROM  [ FOR  ] 
   ```
   I can't find any other case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array

2019-07-22 Thread GitBox
beliefer commented on a change in pull request #25172: [SPARK-28412][SQL] ANSI 
SQL: OVERLAY function support byte array
URL: https://github.com/apache/spark/pull/25172#discussion_r306122505
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ##
 @@ -496,19 +509,38 @@ case class Overlay(input: Expression, replace: 
Expression, pos: Expression, len:
 this(str, replace, pos, Literal.create(-1, IntegerType))
   }
 
-  override def dataType: DataType = StringType
+  override def dataType: DataType = input.dataType
 
-  override def inputTypes: Seq[AbstractDataType] =
-Seq(StringType, StringType, IntegerType, IntegerType)
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(StringType, BinaryType),
+TypeCollection(StringType, BinaryType), IntegerType, IntegerType)
 
   override def children: Seq[Expression] = input :: replace :: pos :: len :: 
Nil
 
+  override def checkInputDataTypes(): TypeCheckResult = {
 
 Review comment:
   According ANSI SQL,  only (binary, binary) or (string, string)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone

2019-07-22 Thread GitBox
HyukjinKwon commented on a change in pull request #25047: 
[WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
URL: https://github.com/apache/spark/pull/25047#discussion_r306121633
 
 

 ##
 File path: python/pyspark/tests/test_context.py
 ##
 @@ -273,7 +273,8 @@ def setUp(self):
 self.tempFile.close()
 os.chmod(self.tempFile.name, stat.S_IRWXU | stat.S_IXGRP | 
stat.S_IRGRP |
  stat.S_IROTH | stat.S_IXOTH)
-conf = SparkConf().set("spark.driver.resource.gpu.amount", "1")
+conf = SparkConf().set("spark.test.home", SPARK_HOME)
 
 Review comment:
   This is because `spark.test.home` property is only set in testing mode in 
JVM. I left a comment 
https://github.com/apache/spark/pull/25047/files/95111b0b9c0a4732041d19b5e221eecea4408147#r306121558


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone

2019-07-22 Thread GitBox
HyukjinKwon commented on a change in pull request #25047: 
[WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone
URL: https://github.com/apache/spark/pull/25047#discussion_r306121558
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
 ##
 @@ -70,6 +93,276 @@ private[spark] object ResourceUtils extends Logging {
   // internally we currently only support addresses, so its just an integer 
count
   val AMOUNT = "amount"
 
+  /**
+   * Assign resources to workers/drivers from the same host to avoid address 
conflict.
+   *
+   * This function works in three steps. First, acquiring the lock on 
RESOURCES_LOCK_FILE
+   * to achieve synchronization among workers and drivers. Second, getting all 
allocated
+   * resources from ALLOCATED_RESOURCES_FILE and assigning isolated resources 
to the worker
+   * or driver after differentiating available resources in discovered 
resources from
+   * allocated resources. If available resources don't meet worker's or 
driver's requirement,
+   * try to update allocated resources by excluding the resource allocation if 
its related
+   * process has already terminated and do the assignment again. If still 
don't meet requirement,
+   * exception would be threw. Third, updating ALLOCATED_RESOURCES_FILE with 
new allocated
+   * resources along with pid for the worker or driver. Then, return allocated 
resources
+   * information after releasing the lock.
+   *
+   * @param conf SparkConf
+   * @param componentName spark.driver / spark.worker
+   * @param resources the resources found by worker/driver on the host
+   * @param resourceRequirements the resource requirements asked by the 
worker/driver
+   * @param pid the process id of worker/driver to acquire resources.
+   * @return allocated resources for the worker/driver or throws exception if 
can't
+   * meet worker/driver's requirement
+   */
+  def acquireResources(
+  conf: SparkConf,
+  componentName: String,
+  resources: Map[String, ResourceInformation],
+  resourceRequirements: Seq[ResourceRequirement],
+  pid: Int)
+: Map[String, ResourceInformation] = {
+if (resourceRequirements.isEmpty) {
+  return Map.empty
+}
+val lock = acquireLock(conf)
+val resourcesFile = new File(getOrCreateResourcesDir(conf), 
ALLOCATED_RESOURCES_FILE)
+// all allocated resources in ALLOCATED_RESOURCES_FILE, can be updated if 
any allocations'
+// related processes detected to be terminated while checking pids below.
+var origAllocation = Seq.empty[StandaloneResourceAllocation]
+var allocated = {
+  if (resourcesFile.exists()) {
+origAllocation = allocatedStandaloneResources(resourcesFile.getPath)
+val allocations = origAllocation.map { resource =>
+  val resourceMap = {
+resource.allocations.map { allocation =>
+  allocation.id.resourceName -> allocation.addresses.toArray
+}.toMap
+  }
+  resource.pid -> resourceMap
+}.toMap
+allocations
+  } else {
+Map.empty[Int, Map[String, Array[String]]]
+  }
+}
+
+// new allocated resources for worker or driver,
+// map from resource name to its allocated addresses.
+var newAssignments: Map[String, Array[String]] = null
+// Whether we've checked process status and we'll only do the check at 
most once.
+// Do the check iff the available resources can't meet the requirements at 
the first time.
+var checked = false
+// Whether we need to keep allocating for the worker/driver and we'll only 
go through
+// the loop at most twice.
+var keepAllocating = true
+while (keepAllocating) {
+  keepAllocating = false
+  // store the pid whose related allocated resources conflict with
+  // discovered resources passed in.
+  val pidsToCheck = mutable.Set[Int]()
+  newAssignments = resourceRequirements.map { req =>
+val rName = req.resourceName
+val amount = req.amount
+// initially, we must have available.length >= amount as we've done 
pre-check previously
+var available = resources(rName).addresses
+// gets available resource addresses by excluding all
+// allocated resource addresses from discovered resources
+allocated.foreach { a =>
+  val thePid = a._1
+  val resourceMap = a._2
+  val assigned = resourceMap.getOrElse(rName, Array.empty)
+  val retained = available.diff(assigned)
+  if (retained.length < available.length && !checked) {
+pidsToCheck += thePid
+  }
+  if (retained.length >= amount) {
+available = retained
+  } else if (checked) {
+keepAllocating = false
+releaseLock(lock)
+throw new SparkException(s"No more resources available since 
they've already" +
+  s" assigned to other workers/driv

[GitHub] [spark] AmplabJenkins removed a comment on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #24958: [SPARK-28153][PYTHON] Use 
AtomicReference at InputFileBlockHolder (to support input_file_name with Python 
UDF)
URL: https://github.com/apache/spark/pull/24958#issuecomment-514045154
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #24958: [SPARK-28153][PYTHON] Use 
AtomicReference at InputFileBlockHolder (to support input_file_name with Python 
UDF)
URL: https://github.com/apache/spark/pull/24958#issuecomment-514045159
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108027/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #24958: [SPARK-28153][PYTHON] Use 
AtomicReference at InputFileBlockHolder (to support input_file_name with Python 
UDF)
URL: https://github.com/apache/spark/pull/24958#issuecomment-514045159
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108027/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #24958: [SPARK-28153][PYTHON] Use 
AtomicReference at InputFileBlockHolder (to support input_file_name with Python 
UDF)
URL: https://github.com/apache/spark/pull/24958#issuecomment-514045154
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)

2019-07-22 Thread GitBox
SparkQA removed a comment on issue #24958: [SPARK-28153][PYTHON] Use 
AtomicReference at InputFileBlockHolder (to support input_file_name with Python 
UDF)
URL: https://github.com/apache/spark/pull/24958#issuecomment-514014636
 
 
   **[Test build #108027 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108027/testReport)**
 for PR 24958 at commit 
[`cb4cfde`](https://github.com/apache/spark/commit/cb4cfdee34b02d3be7372eeb972c7908b63469d5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)

2019-07-22 Thread GitBox
SparkQA commented on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at 
InputFileBlockHolder (to support input_file_name with Python UDF)
URL: https://github.com/apache/spark/pull/24958#issuecomment-514044796
 
 
   **[Test build #108027 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108027/testReport)**
 for PR 24958 at commit 
[`cb4cfde`](https://github.com/apache/spark/commit/cb4cfdee34b02d3be7372eeb972c7908b63469d5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #25225: [SPARK-28469][SQL] Change CalendarIntervalType's readable string representation from calendarinterval to interval

2019-07-22 Thread GitBox
HyukjinKwon commented on issue #25225: [SPARK-28469][SQL] Change 
CalendarIntervalType's readable string representation from calendarinterval to 
interval
URL: https://github.com/apache/spark/pull/25225#issuecomment-514044823
 
 
   +1 too


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #25225: [SPARK-28469][SQL] Change CalendarIntervalType's readable string representation from calendarinterval to interval

2019-07-22 Thread GitBox
dongjoon-hyun closed pull request #25225: [SPARK-28469][SQL] Change 
CalendarIntervalType's readable string representation from calendarinterval to 
interval
URL: https://github.com/apache/spark/pull/25225
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #25189: [SPARK-28435][SQL] Support cast StringType to IntervalType for SQL interface

2019-07-22 Thread GitBox
dongjoon-hyun commented on issue #25189: [SPARK-28435][SQL] Support cast 
StringType to IntervalType for SQL interface
URL: https://github.com/apache/spark/pull/25189#issuecomment-514044307
 
 
   #25225 is merged now. Could you rebase this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #25225: [SPARK-28469][SQL] Change CalendarIntervalType's readable string representation from calendarinterval to interval

2019-07-22 Thread GitBox
dongjoon-hyun commented on issue #25225: [SPARK-28469][SQL] Change 
CalendarIntervalType's readable string representation from calendarinterval to 
interval
URL: https://github.com/apache/spark/pull/25225#issuecomment-514043510
 
 
   Yep. I think it's okay by itself. Thank you for the pointer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base

2019-07-22 Thread GitBox
SparkQA commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert 
and port 'pgSQL/select_having.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25161#issuecomment-514042578
 
 
   **[Test build #108031 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108031/testReport)**
 for PR 25161 at commit 
[`8795d66`](https://github.com/apache/spark/commit/8795d66b189712f54a55a3b0663273fb26126a8e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25161: 
[SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' 
into UDF test base
URL: https://github.com/apache/spark/pull/25161#issuecomment-514042256
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13139/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base

2019-07-22 Thread GitBox
AmplabJenkins removed a comment on issue #25161: 
[SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' 
into UDF test base
URL: https://github.com/apache/spark/pull/25161#issuecomment-514042253
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] 
Convert and port 'pgSQL/select_having.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25161#issuecomment-514042253
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base

2019-07-22 Thread GitBox
AmplabJenkins commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] 
Convert and port 'pgSQL/select_having.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25161#issuecomment-514042256
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13139/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on issue #25178: [SPARK-28421][ML] SparseVector.apply performance optimization

2019-07-22 Thread GitBox
zhengruifeng commented on issue #25178: [SPARK-28421][ML] SparseVector.apply 
performance optimization
URL: https://github.com/apache/spark/pull/25178#issuecomment-514041986
 
 
   The expected cost without range check is `E(cost(apply2)) = log(NNZ)`;
   while the one with range check is `E(cost(apply3)) = 2 + P(key in 
range)*log(NNZ)`;
   The diff is `E(cost(apply3) - cost(apply2)) = 2 - P(key out of range) * 
log(NNZ)`, so the optimization is high related to the key distribution and the 
`NNZ`.
   The above suite suppose the input key is from an uniform distribution. And 
show that, if the `NNZ` is small, range check will cost extra 10% cost; 
otherwise, the range check will save about 50% cost.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shivusondur commented on a change in pull request #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base

2019-07-22 Thread GitBox
shivusondur commented on a change in pull request #25161: 
[SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' 
into UDF test base
URL: https://github.com/apache/spark/pull/25161#discussion_r306117675
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/udf/pgSQL/udf-select_having.sql
 ##
 @@ -0,0 +1,58 @@
+--
+-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+--
+--
+-- SELECT_HAVING
+-- 
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/select_having.sql
+--
+-- This test file was converted from inputs/pgSQL/select_having.sql
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >