[GitHub] [spark] SparkQA commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-28 Thread GitBox


SparkQA commented on pull request #31680:
URL: https://github.com/apache/spark/pull/31680#issuecomment-809088540


   **[Test build #136638 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136638/testReport)**
 for PR 31680 at commit 
[`2afad3b`](https://github.com/apache/spark/commit/2afad3b82cef36abecd4d32d14cb8736d878d49d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-28 Thread GitBox


SparkQA commented on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809088302


   **[Test build #136637 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136637/testReport)**
 for PR 31984 at commit 
[`80f00a0`](https://github.com/apache/spark/commit/80f00a0a0d2ec68766f4a8fdbeb09378ecd02a10).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31987: [WIP][SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809087002


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136627/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31989: [WIP][SPARK-34891][SS] Introduce state store manager for session window in streaming query

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31989:
URL: https://github.com/apache/spark/pull/31989#issuecomment-809086999


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136632/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31985: [SPARK-34860][ML] Multinomial Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31985:
URL: https://github.com/apache/spark/pull/31985#issuecomment-809087006


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41212/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809087005


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136623/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809087005


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136623/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31989: [WIP][SPARK-34891][SS] Introduce state store manager for session window in streaming query

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31989:
URL: https://github.com/apache/spark/pull/31989#issuecomment-809086999


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136632/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-28 Thread GitBox


SparkQA commented on pull request #31680:
URL: https://github.com/apache/spark/pull/31680#issuecomment-809087041


   **[Test build #136636 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136636/testReport)**
 for PR 31680 at commit 
[`f9024fe`](https://github.com/apache/spark/commit/f9024fecda1c75d631b1b8bd5b478c8ceae9de2f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31987: [WIP][SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809087002


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136627/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31985: [SPARK-34860][ML] Multinomial Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31985:
URL: https://github.com/apache/spark/pull/31985#issuecomment-809087006


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41212/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809083608


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41206/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809083608


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41206/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-28 Thread GitBox


SparkQA commented on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809083569


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41206/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk closed pull request #31979: [SPARK-34879][SQL] HiveInspector supports DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


MaxGekk closed pull request #31979:
URL: https://github.com/apache/spark/pull/31979


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31901: [SPARK-34802][SQL] Move simplify expression rules before operator push down

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31901:
URL: https://github.com/apache/spark/pull/31901#issuecomment-809079486


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41208/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31987: [WIP][SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809079495


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41210/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809079487






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31985: [SPARK-34860][ML] Multinomial Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31985:
URL: https://github.com/apache/spark/pull/31985#issuecomment-809079491


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136625/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31983: [SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31983:
URL: https://github.com/apache/spark/pull/31983#issuecomment-809079490


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41209/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


MaxGekk commented on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809080557


   +1, LGTM. Merging to master.
   Thank you @AngersZh .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31958: [SPARK-34862][SQL] Support nested column in ORC vectorized reader

2021-03-28 Thread GitBox


SparkQA commented on pull request #31958:
URL: https://github.com/apache/spark/pull/31958#issuecomment-809080274


   **[Test build #136635 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136635/testReport)**
 for PR 31958 at commit 
[`9cd3bc5`](https://github.com/apache/spark/commit/9cd3bc573514a9e25f1e7364aacfb4c86c661552).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31983: [SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates

2021-03-28 Thread GitBox


SparkQA commented on pull request #31983:
URL: https://github.com/apache/spark/pull/31983#issuecomment-809080227


   **[Test build #136634 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136634/testReport)**
 for PR 31983 at commit 
[`c727a0c`](https://github.com/apache/spark/commit/c727a0c1a14afaec6190c2950de0059a1930d749).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31983: [SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31983:
URL: https://github.com/apache/spark/pull/31983#issuecomment-809079490


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41209/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31901: [SPARK-34802][SQL] Move simplify expression rules before operator push down

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31901:
URL: https://github.com/apache/spark/pull/31901#issuecomment-809079486


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41208/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809079487






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31987: [WIP][SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809079495


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41210/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31985: [SPARK-34860][ML] Multinomial Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31985:
URL: https://github.com/apache/spark/pull/31985#issuecomment-809079491


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136625/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


MaxGekk commented on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-80907


   > Apache Spark master branch doesn't have Hive 1.2 
   
   @dongjoon-hyun Thank you for the information. @AngersZh Sorry, I wasn't 
aware of that it was removed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #31982: [SPARK-34881][SQL] New SQL Function: TRY_CAST

2021-03-28 Thread GitBox


maropu commented on a change in pull request #31982:
URL: https://github.com/apache/spark/pull/31982#discussion_r603013688



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala
##
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.expressions.codegen._
+import org.apache.spark.sql.catalyst.expressions.codegen.Block._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A special version of [[AnsiCast]]. It performs the same operation (i.e. 
converts a value of
+ * one data type into another data type), but returns a NULL value instead of 
raising an error
+ * when the conversion can not be performed.
+ *
+ * When cast from/to timezone related types, we need timeZoneId, which will be 
resolved with
+ * session local timezone by an analyzer [[ResolveTimeZone]].
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data 
type `type`. " +
+"This expression is identical to CAST with `spark.sql.ansi.enabled` as 
true, " +
+"except it returns NULL instead of raising an error. " +
+"This expression has one major difference from `cast` with 
`spark.sql.ansi.enabled` as true: " +
+"when the source value can't be stored in the target 
integral(Byte/Short/Int/Long) type, " +
+"`try_cast` returns null instead of returning the low order bytes of the 
source value.",

Review comment:
   nit: `try_cast` => `_FUNC_`?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala
##
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.expressions.codegen._
+import org.apache.spark.sql.catalyst.expressions.codegen.Block._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A special version of [[AnsiCast]]. It performs the same operation (i.e. 
converts a value of
+ * one data type into another data type), but returns a NULL value instead of 
raising an error
+ * when the conversion can not be performed.
+ *
+ * When cast from/to timezone related types, we need timeZoneId, which will be 
resolved with
+ * session local timezone by an analyzer [[ResolveTimeZone]].
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data 
type `type`. " +
+"This expression is identical to CAST with `spark.sql.ansi.enabled` as 
true, " +
+"except it returns NULL instead of raising an error. " +
+"This expression has one major difference from `cast` with 
`spark.sql.ansi.enabled` as true: " +

Review comment:
   ditto

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -1610,6 +1610,17 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
 cast
   }
 
+  /**
+   * Create a [[TryCast]] expression.
+   */
+  override def visitTryCast(ctx: TryCastContext): Expression = withOrigin(ctx) 
{

Review comment:
   `visitCast` and `visitTryCast` are similar between each other, so how 
about merging their definition in `SqlBase.sql`?
   ```
   | cast=(CAST | TRY_CAST) '(' expression 

[GitHub] [spark] sarutak commented on a change in pull request #31964: [SPARK-34872][SQL] quoteIfNeeded should quote a name which contains non-word characters

2021-03-28 Thread GitBox


sarutak commented on a change in pull request #31964:
URL: https://github.com/apache/spark/pull/31964#discussion_r603019469



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala
##
@@ -148,10 +148,10 @@ package object util extends Logging {
   }
 
   def quoteIfNeeded(part: String): String = {

Review comment:
   I looked into the following classes which use `quoteIfNeeded`
   
   * ResolveSessionCatalog.apply
   * IdentifierHelper.quoted
   * MultipartIdentifierHelper.quoted
   * DatabaseInSessionCatalog$.unapply
   * NamespaceHelper.quoted
   * Alias.sql
   * AttributeReference.sql
   * UnresolvedAttribute.sql
   * IdentifierImpl.toString
   * IdentifierHelper.quoted
   
   Finally, I think this change doesn't break existing behavior.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


SparkQA removed a comment on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809025371


   **[Test build #136622 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136622/testReport)**
 for PR 31979 at commit 
[`796f1f4`](https://github.com/apache/spark/commit/796f1f4177a6f6f852c220b8a9aa42d16e7518e8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


SparkQA commented on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809070189


   **[Test build #136622 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136622/testReport)**
 for PR 31979 at commit 
[`796f1f4`](https://github.com/apache/spark/commit/796f1f4177a6f6f852c220b8a9aa42d16e7518e8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #31989: [WIP][SPARK-34891][SS] Introduce state store manager for session window in streaming query

2021-03-28 Thread GitBox


HeartSaVioR edited a comment on pull request #31989:
URL: https://github.com/apache/spark/pull/31989#issuecomment-809067233


   Except the test suite, one more thing worths to address here is write 
amplification; we "blindly" replace all start times and all sessions. This 
could bring unnecessary writes on "unmodified" existing sessions. In many cases 
we expect the new inputs will be bound and expanding to the existing sessions, 
but with very long watermark gap and old inputs which have various timestamps, 
the case could still happen.
   
   EDIT: I realized the logic is bound to the physical plan. Though it seems OK 
to move the logic to here so that the logic to store new session windows 
efficiently can be bound to the state format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31901: [SPARK-34802][SQL] Move simplify expression rules before operator push down

2021-03-28 Thread GitBox


SparkQA commented on pull request #31901:
URL: https://github.com/apache/spark/pull/31901#issuecomment-809069105


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41208/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #31989: [WIP][SPARK-34891][SS] Introduce state store manager for session window in streaming query

2021-03-28 Thread GitBox


HeartSaVioR commented on pull request #31989:
URL: https://github.com/apache/spark/pull/31989#issuecomment-809067233


   Except the test suite, one more thing worths to address here is write 
amplification; we "blindly" replace all start times and all sessions. This 
could bring unnecessary writes on "unmodified" existing sessions. In many cases 
we expect the new inputs will be bound and expanding to the existing sessions, 
but with very long watermark gap and old inputs which have various timestamps, 
the case could still happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31985: [SPARK-34860][ML] Multinomial Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


SparkQA removed a comment on pull request #31985:
URL: https://github.com/apache/spark/pull/31985#issuecomment-809042739


   **[Test build #136625 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136625/testReport)**
 for PR 31985 at commit 
[`cdaafc2`](https://github.com/apache/spark/commit/cdaafc28f458d45a6f1a257b2cea381db7a09637).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31985: [SPARK-34860][ML] Multinomial Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


SparkQA commented on pull request #31985:
URL: https://github.com/apache/spark/pull/31985#issuecomment-809066739


   **[Test build #136625 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136625/testReport)**
 for PR 31985 at commit 
[`cdaafc2`](https://github.com/apache/spark/commit/cdaafc28f458d45a6f1a257b2cea381db7a09637).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #31958: [SPARK-34862][SQL] Support nested column in ORC vectorized reader

2021-03-28 Thread GitBox


c21 commented on a change in pull request #31958:
URL: https://github.com/apache/spark/pull/31958#discussion_r603014404



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -838,6 +838,13 @@ object SQLConf {
 .intConf
 .createWithDefault(4096)
 
+  val ORC_VECTORIZED_READER_NESTED_COLUMN_ENABLED =
+buildConf("spark.sql.orc.enableNestedColumnVectorizedReader")
+  .doc("Enables vectorized orc decoding for nested column.")
+  .version("3.2.0")
+  .booleanConf
+  .createWithDefault(true)

Review comment:
   @dongjoon-hyun - makes sense to me. Updated. For all reviewers, 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136587/testReport
 is the passed unit tests when enabling nested column vectorized reader by 
default.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
##
@@ -131,11 +131,27 @@ class OrcFileFormat
 }
   }
 
+  private def supportBatchForNestedColumn(
+  sparkSession: SparkSession,
+  schema: StructType): Boolean = {
+val hasNestedColumn = schema.map(_.dataType).exists {
+  case _: ArrayType | _: MapType | _: StructType => true
+  case _ => false
+}
+if (hasNestedColumn) {
+  sparkSession.sessionState.conf.orcVectorizedReaderNestedColumnEnabled
+} else {
+  true
+}
+  }
+
   override def supportBatch(sparkSession: SparkSession, schema: StructType): 
Boolean = {
 val conf = sparkSession.sessionState.conf
 conf.orcVectorizedReaderEnabled && conf.wholeStageEnabled &&
   schema.length <= conf.wholeStageMaxNumFields &&
-  schema.forall(_.dataType.isInstanceOf[AtomicType])
+  schema.forall(s => supportDataType(s.dataType) &&
+!s.dataType.isInstanceOf[UserDefinedType[_]]) &&
+  supportBatchForNestedColumn(sparkSession, schema)

Review comment:
   @dongjoon-hyun - do you mean implementing Parquet vectorized reader for 
nested column? I created https://issues.apache.org/jira/browse/SPARK-34863 and 
plan to do it after this one, thanks.

##
File path: project/MimaExcludes.scala
##
@@ -417,6 +417,21 @@ object MimaExcludes {
   case _ => true
 },
 
+// [SPARK-34862][SQL] Support nested column in ORC vectorized reader
+
ProblemFilters.exclude[DirectAbstractMethodProblem]("org.apache.spark.sql.vectorized.ColumnVector.getBoolean"),
+
ProblemFilters.exclude[DirectAbstractMethodProblem]("org.apache.spark.sql.vectorized.ColumnVector.getByte"),
+
ProblemFilters.exclude[DirectAbstractMethodProblem]("org.apache.spark.sql.vectorized.ColumnVector.getShort"),
+
ProblemFilters.exclude[DirectAbstractMethodProblem]("org.apache.spark.sql.vectorized.ColumnVector.getInt"),
+
ProblemFilters.exclude[DirectAbstractMethodProblem]("org.apache.spark.sql.vectorized.ColumnVector.getLong"),
+
ProblemFilters.exclude[DirectAbstractMethodProblem]("org.apache.spark.sql.vectorized.ColumnVector.getFloat"),
+
ProblemFilters.exclude[DirectAbstractMethodProblem]("org.apache.spark.sql.vectorized.ColumnVector.getDouble"),
+
ProblemFilters.exclude[DirectAbstractMethodProblem]("org.apache.spark.sql.vectorized.ColumnVector.getDecimal"),
+
ProblemFilters.exclude[DirectAbstractMethodProblem]("org.apache.spark.sql.vectorized.ColumnVector.getUTF8String"),
+
ProblemFilters.exclude[DirectAbstractMethodProblem]("org.apache.spark.sql.vectorized.ColumnVector.getBinary"),
+
ProblemFilters.exclude[DirectAbstractMethodProblem]("org.apache.spark.sql.vectorized.ColumnVector.getArray"),
+
ProblemFilters.exclude[DirectAbstractMethodProblem]("org.apache.spark.sql.vectorized.ColumnVector.getMap"),
+
ProblemFilters.exclude[DirectAbstractMethodProblem]("org.apache.spark.sql.vectorized.ColumnVector.getChild"),

Review comment:
   @dongjoon-hyun - updated, thanks. Sorry I was not looking at this file 
very closely.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-28 Thread GitBox


SparkQA commented on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809064991


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41206/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31983: [SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates

2021-03-28 Thread GitBox


SparkQA commented on pull request #31983:
URL: https://github.com/apache/spark/pull/31983#issuecomment-809064604


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41209/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31901: [SPARK-34802][SQL] Move simplify expression rules before operator push down

2021-03-28 Thread GitBox


SparkQA commented on pull request #31901:
URL: https://github.com/apache/spark/pull/31901#issuecomment-809064539


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41208/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


SparkQA removed a comment on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809024472


   **[Test build #136621 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136621/testReport)**
 for PR 31979 at commit 
[`4e88bdf`](https://github.com/apache/spark/commit/4e88bdf72919dd3c65f6ddb03e5424de4b689160).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


SparkQA commented on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809063741


   **[Test build #136621 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136621/testReport)**
 for PR 31979 at commit 
[`4e88bdf`](https://github.com/apache/spark/commit/4e88bdf72919dd3c65f6ddb03e5424de4b689160).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31983: [SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31983:
URL: https://github.com/apache/spark/pull/31983#issuecomment-809062638


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136626/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809062636


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41207/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31680:
URL: https://github.com/apache/spark/pull/31680#issuecomment-809062635


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136633/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31983: [SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31983:
URL: https://github.com/apache/spark/pull/31983#issuecomment-809062638


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136626/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809062636


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41207/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31680:
URL: https://github.com/apache/spark/pull/31680#issuecomment-809062635


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136633/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31989: [WIP][SPARK-34891][SS] Introduce state store manager for session window in streaming query

2021-03-28 Thread GitBox


SparkQA commented on pull request #31989:
URL: https://github.com/apache/spark/pull/31989#issuecomment-809062214


   **[Test build #136632 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136632/testReport)**
 for PR 31989 at commit 
[`a7bd8a9`](https://github.com/apache/spark/commit/a7bd8a91c52970165e668b6a8d07ade2899c915e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-28 Thread GitBox


HeartSaVioR commented on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-809062076


   I filed 5 JIRA issues for all parts, and submitted 3 PRs which are not 
dependent to others. Remaining 2 parts depend on others and I'll deal with them 
once we merge dependents.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31985: [SPARK-34860][ML] Multinomial Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


SparkQA commented on pull request #31985:
URL: https://github.com/apache/spark/pull/31985#issuecomment-809061510


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41205/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR opened a new pull request #31989: [WIP][SPARK-34891][SS] Introduce state store manager for session window in streaming query

2021-03-28 Thread GitBox


HeartSaVioR opened a new pull request #31989:
URL: https://github.com/apache/spark/pull/31989


   Introduction: this PR is a part of SPARK-10816 (`EventTime based 
sessionization (session window)`). Please refer #31937 to see the overall view 
of the code change. (Note that code diff could be diverged a bit.)
   
   ### What changes were proposed in this pull request?
   
   This PR introduces state store manager for session window in streaming 
query. Session window in batch query wouldn't need to leverage state store 
manager.
   
   This PR ensures versioning on state format for state store manager, so that 
we can apply further optimization after releasing Spark version. 
StreamingSessionWindowStateManager is a trait defining the available methods in 
session window state store manager. StreamingSessionWindowStateManagerBaseImpl 
and its subclasses are classes implementing the trait with versioning.
   
   The format of version 1 leverages two state stores to represent the session 
windows:
   
   * key -> list of start times (in session window spec)
   * key + start time in session window -> value
   
   This structure is simpler compared to what we tried to implement in history, 
and also less sub-optimal as it doesn't require all values to be rewritten when 
any of session window is added/modified/removed.
   
   ### Why are the changes needed?
   
   This part is a one of required on implementing SPARK-10816.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   WIP (new test suite is expected to be added, or can be skipped if we agree 
it can be skipped)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31988: [SPARK-34855][CORE] Avoid local lazy variable in SparkContext.getCallSite

2021-03-28 Thread GitBox


SparkQA commented on pull request #31988:
URL: https://github.com/apache/spark/pull/31988#issuecomment-809059717


   **[Test build #136631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136631/testReport)**
 for PR 31988 at commit 
[`d2641d9`](https://github.com/apache/spark/commit/d2641d90e5a49a91312748fb655e2a7cd3790d3f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #31988: [SPARK-34855][CORE] Avoid local lazy variable in SparkContext.getCallSite

2021-03-28 Thread GitBox


viirya commented on pull request #31988:
URL: https://github.com/apache/spark/pull/31988#issuecomment-809059452


   cc @HyukjinKwon @srowen @lxian
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya opened a new pull request #31988: [SPARK-34855][CORE] Avoid local lazy variable in SparkContext.getCallSite

2021-03-28 Thread GitBox


viirya opened a new pull request #31988:
URL: https://github.com/apache/spark/pull/31988


   
   
   ### What changes were proposed in this pull request?
   
   
   `SparkContext.getCallSite` uses local lazy variable. In Scala 2.11, local 
lazy val requires synchronization so for large number of job submissions in the 
same context, it will be a bottleneck. This only for branch-2.4 as we drop 
Scala 2.11 support at SPARK-26132.
   
   ### Why are the changes needed?
   
   
   To avoid possible bottleneck for large number of job submissions in the same 
context.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-28 Thread GitBox


SparkQA commented on pull request #31680:
URL: https://github.com/apache/spark/pull/31680#issuecomment-809059073


   **[Test build #136630 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136630/testReport)**
 for PR 31680 at commit 
[`f73421a`](https://github.com/apache/spark/commit/f73421ae2df131faeb2509083099ecbbd645a7d0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements

2021-03-28 Thread GitBox


SparkQA commented on pull request #31986:
URL: https://github.com/apache/spark/pull/31986#issuecomment-809058962


   **[Test build #136628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136628/testReport)**
 for PR 31986 at commit 
[`3e8dd5c`](https://github.com/apache/spark/commit/3e8dd5ccd8c2c136f5c3a4ff64269267edbdf81e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31985: [SPARK-34860][ML] Multinomial Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


SparkQA commented on pull request #31985:
URL: https://github.com/apache/spark/pull/31985#issuecomment-809058974


   **[Test build #136629 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136629/testReport)**
 for PR 31985 at commit 
[`461d111`](https://github.com/apache/spark/commit/461d1110da71d504780f5a1f7db07fceaf597938).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31987: [WIP][SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-28 Thread GitBox


SparkQA commented on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809058939


   **[Test build #136627 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136627/testReport)**
 for PR 31987 at commit 
[`5020827`](https://github.com/apache/spark/commit/5020827a74b7bbe67951057ef64a09061c099d90).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809058108


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136620/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tanelk commented on pull request #31973: [SPARK-34876][SQL] Fill defaultResult of non-nullable aggregates

2021-03-28 Thread GitBox


tanelk commented on pull request #31973:
URL: https://github.com/apache/spark/pull/31973#issuecomment-809058487


   @HyukjinKwon , There is a failure on branch-2.4. I believe it is because 
`CountIf` exists since 3.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809058108


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136620/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR opened a new pull request #31987: [WIP][SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-28 Thread GitBox


HeartSaVioR opened a new pull request #31987:
URL: https://github.com/apache/spark/pull/31987


   Introduction: this PR is a part of SPARK-10816 (`EventTime based 
sessionization (session window)`). Please refer #31937 to see the overall view 
of the code change. (Note that code diff could be diverged a bit.)
   
   ### What changes were proposed in this pull request?
   
   This PR introduces MergingSessionsIterator, which enables to merge elements 
belong to the same session directly.
   
   MergingSessionsIterator is a variant of SortAggregateIterator which merges 
the session windows based on the fact input rows are sorted by "group keys + 
the start time of session window". When merging windows, 
MergingSessionsIterator also applies aggregations on merged window, which 
eliminates the necessity on buffering inputs (which requires copying rows) and 
update the session spec for each input.
   
   MergingSessionsIterator is quite performant compared to 
UpdatingSessionsIterator brought by SPARK-34888. Note that 
MergingSessionsIterator can only apply to the cases aggregation can be applied 
altogether, so there're still rooms for UpdatingSessionIterator to be used.
   
   This issue also introduces MergingSessionsExec which is the physical node on 
leveraging MergingSessionsIterator to sort the input rows and aggregate rows 
according to the session windows.
   
   ### Why are the changes needed?
   
   This part is a one of required on implementing SPARK-10816.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   WIP (new test suite is expected to be added, or can be skipped if we agree 
it can be skipped)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #31953: [SPARK-34855][CORE]spark context - avoid using local lazy val for callSite

2021-03-28 Thread GitBox


viirya commented on a change in pull request #31953:
URL: https://github.com/apache/spark/pull/31953#discussion_r603005164



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -2186,13 +2186,22 @@ class SparkContext(config: SparkConf) extends Logging {
* has overridden the call site using `setCallSite()`, this will return the 
user's version.
*/
   private[spark] def getCallSite(): CallSite = {
-lazy val callSite = Utils.getCallSite()
-CallSite(
-  
Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
-  Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
-)
+if (getLocalProperty(CallSite.SHORT_FORM) == null

Review comment:
   This is the last issue for 2.4, I think. Okay, let me create a PR first. 
I can close it if the author opens his after.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-28 Thread GitBox


SparkQA removed a comment on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809007618


   **[Test build #136620 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136620/testReport)**
 for PR 31984 at commit 
[`b510d7d`](https://github.com/apache/spark/commit/b510d7da21f8a92af69b1485b72aef6ad5901448).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-28 Thread GitBox


SparkQA commented on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809053124


   **[Test build #136620 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136620/testReport)**
 for PR 31984 at commit 
[`b510d7d`](https://github.com/apache/spark/commit/b510d7da21f8a92af69b1485b72aef6ad5901448).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR opened a new pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements

2021-03-28 Thread GitBox


HeartSaVioR opened a new pull request #31986:
URL: https://github.com/apache/spark/pull/31986


   Introduction: this PR is a part of SPARK-10816 (`EventTime based 
sessionization (session window)`). Please refer #31937 to see the overall view 
of the code change. (Note that code diff could be diverged a bit.)
   
   ### What changes were proposed in this pull request?
   
   This PR introduces UpdatingSessionsIterator, which analyzes neighbor 
elements and adjust session information on elements.
   
   UpdatingSessionsIterator calculates and updates the session window for each 
element in the given iterator, which makes elements in the same session window 
having same session spec. Downstream can apply aggregation to finally merge 
these elements bound to the same session window.
   
   UpdatingSessionsIterator works on the precondition that given iterator is 
sorted by "group keys + start time of session window", and the iterator still 
retains the characteristic of the sort.
   
   UpdatingSessionsIterator copies the elements to safely update on each 
element, as well as buffers elements which are bound to the same session 
window. Due to such overheads, MergingSessionsIterator which will be introduced 
via SPARK-34889 should be used whenever possible.
   
   This PR also introduces UpdatingSessionsExec which is the physical node on 
leveraging UpdatingSessionsIterator to sort the input rows and updates session 
information on input rows.
   
   ### Why are the changes needed?
   
   This part is a one of required on implementing SPARK-10816.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New test suite added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-28 Thread GitBox


AngersZh commented on pull request #31680:
URL: https://github.com/apache/spark/pull/31680#issuecomment-809051861


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #31953: [SPARK-34855][CORE]spark context - avoid using local lazy val for callSite

2021-03-28 Thread GitBox


HyukjinKwon commented on a change in pull request #31953:
URL: https://github.com/apache/spark/pull/31953#discussion_r603003296



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -2186,13 +2186,22 @@ class SparkContext(config: SparkConf) extends Logging {
* has overridden the call site using `setCallSite()`, this will return the 
user's version.
*/
   private[spark] def getCallSite(): CallSite = {
-lazy val callSite = Utils.getCallSite()
-CallSite(
-  
Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
-  Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
-)
+if (getLocalProperty(CallSite.SHORT_FORM) == null

Review comment:
   @viirya I believe it's fine for you to just go ahead IMO .. the author 
became inactive 4 days and this is the blocker of 2.4 (I guess?).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #31953: [SPARK-34855][CORE]spark context - avoid using local lazy val for callSite

2021-03-28 Thread GitBox


viirya commented on a change in pull request #31953:
URL: https://github.com/apache/spark/pull/31953#discussion_r603000582



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -2186,13 +2186,22 @@ class SparkContext(config: SparkConf) extends Logging {
* has overridden the call site using `setCallSite()`, this will return the 
user's version.
*/
   private[spark] def getCallSite(): CallSite = {
-lazy val callSite = Utils.getCallSite()
-CallSite(
-  
Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
-  Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
-)
+if (getLocalProperty(CallSite.SHORT_FORM) == null

Review comment:
   @lxian Can you create a PR for branch-2.4? If you are busy, would you 
mind I create a PR for branch-2.4? Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #31966: [SPARK-34638][SQL] Single field nested column prune on generator output

2021-03-28 Thread GitBox


viirya commented on a change in pull request #31966:
URL: https://github.com/apache/spark/pull/31966#discussion_r602999807



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -231,6 +231,27 @@ object NestedColumnAliasing {
  * of it.
  */
 object GeneratorNestedColumnAliasing {
+  // Partitions `attrToAliases` based on whether the attribute is in 
Generator's output.
+  private def aliasesOnGeneratorOutput(
+  attrToAliases: Map[ExprId, Seq[Alias]],
+  generatorOutput: Seq[Attribute]) = {
+val generatorOutputExprId = generatorOutput.map(_.exprId)
+attrToAliases.partition { k =>
+  generatorOutputExprId.contains(k._1)
+}
+  }
+
+  // Partitions `nestedFieldToAlias` based on whether the attribute of nested 
field extractor
+  // is in Generator's output.
+  private def nestedFieldOnGeneratorOutput(
+  nestedFieldToAlias: Map[ExtractValue, Alias],
+  generatorOutput: Seq[Attribute]) = {
+val generatorOutputSet = AttributeSet(generatorOutput)
+nestedFieldToAlias.partition { pair =>
+  pair._1.references.subsetOf(generatorOutputSet)
+}
+  }

Review comment:
   okay for me. Put it as functions not for reuse but for making the code 
look simpler.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #31966: [SPARK-34638][SQL] Single field nested column prune on generator output

2021-03-28 Thread GitBox


viirya commented on a change in pull request #31966:
URL: https://github.com/apache/spark/pull/31966#discussion_r602999635



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -241,12 +262,69 @@ object GeneratorNestedColumnAliasing {
   // On top on `Generate`, a `Project` that might have nested column 
accessors.
   // We try to get alias maps for both project list and generator's 
children expressions.
   val exprsToPrune = projectList ++ g.generator.children
-  NestedColumnAliasing.getAliasSubMap(exprsToPrune, 
g.qualifiedGeneratorOutput).map {
+  NestedColumnAliasing.getAliasSubMap(exprsToPrune).map {
 case (nestedFieldToAlias, attrToAliases) =>
   // Defer updating `Generate.unrequiredChildIndex` to next round of 
`ColumnPruning`.
-  val newChild =
-NestedColumnAliasing.replaceWithAliases(g, nestedFieldToAlias, 
attrToAliases)
-  Project(NestedColumnAliasing.getNewProjectList(projectList, 
nestedFieldToAlias), newChild)
+
+  val (nestedFieldsOnGenerator, nestedFieldsNotOnGenerator) =
+nestedFieldOnGeneratorOutput(nestedFieldToAlias, 
g.qualifiedGeneratorOutput)
+  val (attrToAliasesOnGenerator, attrToAliasesNotOnGenerator) =
+aliasesOnGeneratorOutput(attrToAliases, g.qualifiedGeneratorOutput)
+
+  // Push nested column accessors through `Generator`. We cannot prune 
on `Generator`'s
+  // output.
+  val newChild = NestedColumnAliasing.replaceWithAliases(g,
+nestedFieldsNotOnGenerator, attrToAliasesNotOnGenerator)
+  val pushedThrough = Project(NestedColumnAliasing
+.getNewProjectList(projectList, nestedFieldsNotOnGenerator), 
newChild)
+
+  // Pruning on `Generator`'s output. We only process single field 
case.
+  // For multiple field case, we cannot directly move field extractor 
into
+  // the generator expression. A workaround is to re-construct array 
of struct
+  // from multiple fields. But it will be more complicated and may not 
worth.
+  if (nestedFieldsOnGenerator.size == 1) {

Review comment:
   sure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #31966: [SPARK-34638][SQL] Single field nested column prune on generator output

2021-03-28 Thread GitBox


viirya commented on a change in pull request #31966:
URL: https://github.com/apache/spark/pull/31966#discussion_r602999476



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -241,12 +262,69 @@ object GeneratorNestedColumnAliasing {
   // On top on `Generate`, a `Project` that might have nested column 
accessors.
   // We try to get alias maps for both project list and generator's 
children expressions.
   val exprsToPrune = projectList ++ g.generator.children
-  NestedColumnAliasing.getAliasSubMap(exprsToPrune, 
g.qualifiedGeneratorOutput).map {
+  NestedColumnAliasing.getAliasSubMap(exprsToPrune).map {
 case (nestedFieldToAlias, attrToAliases) =>
   // Defer updating `Generate.unrequiredChildIndex` to next round of 
`ColumnPruning`.
-  val newChild =
-NestedColumnAliasing.replaceWithAliases(g, nestedFieldToAlias, 
attrToAliases)
-  Project(NestedColumnAliasing.getNewProjectList(projectList, 
nestedFieldToAlias), newChild)
+
+  val (nestedFieldsOnGenerator, nestedFieldsNotOnGenerator) =
+nestedFieldOnGeneratorOutput(nestedFieldToAlias, 
g.qualifiedGeneratorOutput)
+  val (attrToAliasesOnGenerator, attrToAliasesNotOnGenerator) =
+aliasesOnGeneratorOutput(attrToAliases, g.qualifiedGeneratorOutput)
+
+  // Push nested column accessors through `Generator`. We cannot prune 
on `Generator`'s
+  // output.
+  val newChild = NestedColumnAliasing.replaceWithAliases(g,
+nestedFieldsNotOnGenerator, attrToAliasesNotOnGenerator)
+  val pushedThrough = Project(NestedColumnAliasing
+.getNewProjectList(projectList, nestedFieldsNotOnGenerator), 
newChild)
+
+  // Pruning on `Generator`'s output. We only process single field 
case.
+  // For multiple field case, we cannot directly move field extractor 
into
+  // the generator expression. A workaround is to re-construct array 
of struct
+  // from multiple fields. But it will be more complicated and may not 
worth.
+  if (nestedFieldsOnGenerator.size == 1) {
+// Only one nested column accessor.
+// E.g., df.select(explode($"items").as("item")).select($"item.a")
+pushedThrough match {
+  case p @ Project(_, newG: Generate) =>
+// Replace the child expression of `ExplodeBase` generator with
+// nested column accessor.
+// E.g., df.select(explode($"items").as("item")) =>
+//   df.select(explode($"items.a").as("item"))
+val rewrittenG = newG.transformExpressions {
+  case e: ExplodeBase =>
+val extractor = 
nestedFieldsOnGenerator.head._1.transformUp {
+  case _: Attribute =>
+e.child
+  case g: GetStructField =>
+ExtractValue(g.child, Literal(g.extractFieldName), 
SQLConf.get.resolver)
+}
+e.withNewChildren(Seq(extractor))
+}
+
+// As we change the child of the generator, its output data 
type must be updated.
+val updatedGeneratorOutput = rewrittenG.generatorOutput
+.zip(rewrittenG.generator.elementSchema.toAttributes)
+.map { case (oldAttr, newAttr) =>
+  newAttr.withExprId(oldAttr.exprId).withName(oldAttr.name)
+}
+assert(updatedGeneratorOutput.length == 
rewrittenG.generatorOutput.length,

Review comment:
   yea, i think this is the same.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -241,12 +262,69 @@ object GeneratorNestedColumnAliasing {
   // On top on `Generate`, a `Project` that might have nested column 
accessors.
   // We try to get alias maps for both project list and generator's 
children expressions.
   val exprsToPrune = projectList ++ g.generator.children
-  NestedColumnAliasing.getAliasSubMap(exprsToPrune, 
g.qualifiedGeneratorOutput).map {
+  NestedColumnAliasing.getAliasSubMap(exprsToPrune).map {
 case (nestedFieldToAlias, attrToAliases) =>
   // Defer updating `Generate.unrequiredChildIndex` to next round of 
`ColumnPruning`.
-  val newChild =
-NestedColumnAliasing.replaceWithAliases(g, nestedFieldToAlias, 
attrToAliases)
-  Project(NestedColumnAliasing.getNewProjectList(projectList, 
nestedFieldToAlias), newChild)
+
+  val (nestedFieldsOnGenerator, nestedFieldsNotOnGenerator) =
+nestedFieldOnGeneratorOutput(nestedFieldToAlias, 
g.qualifiedGeneratorOutput)
+  val (attrToAliasesOnGenerator, attrToAliasesNotOnGenerator) =
+aliasesOnGeneratorOutput(attrToAliases, 

[GitHub] [spark] viirya commented on a change in pull request #31966: [SPARK-34638][SQL] Single field nested column prune on generator output

2021-03-28 Thread GitBox


viirya commented on a change in pull request #31966:
URL: https://github.com/apache/spark/pull/31966#discussion_r602998834



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -241,12 +262,69 @@ object GeneratorNestedColumnAliasing {
   // On top on `Generate`, a `Project` that might have nested column 
accessors.
   // We try to get alias maps for both project list and generator's 
children expressions.
   val exprsToPrune = projectList ++ g.generator.children
-  NestedColumnAliasing.getAliasSubMap(exprsToPrune, 
g.qualifiedGeneratorOutput).map {
+  NestedColumnAliasing.getAliasSubMap(exprsToPrune).map {
 case (nestedFieldToAlias, attrToAliases) =>
   // Defer updating `Generate.unrequiredChildIndex` to next round of 
`ColumnPruning`.
-  val newChild =
-NestedColumnAliasing.replaceWithAliases(g, nestedFieldToAlias, 
attrToAliases)
-  Project(NestedColumnAliasing.getNewProjectList(projectList, 
nestedFieldToAlias), newChild)
+
+  val (nestedFieldsOnGenerator, nestedFieldsNotOnGenerator) =
+nestedFieldOnGeneratorOutput(nestedFieldToAlias, 
g.qualifiedGeneratorOutput)
+  val (attrToAliasesOnGenerator, attrToAliasesNotOnGenerator) =
+aliasesOnGeneratorOutput(attrToAliases, g.qualifiedGeneratorOutput)
+
+  // Push nested column accessors through `Generator`. We cannot prune 
on `Generator`'s
+  // output.
+  val newChild = NestedColumnAliasing.replaceWithAliases(g,
+nestedFieldsNotOnGenerator, attrToAliasesNotOnGenerator)
+  val pushedThrough = Project(NestedColumnAliasing
+.getNewProjectList(projectList, nestedFieldsNotOnGenerator), 
newChild)
+
+  // Pruning on `Generator`'s output. We only process single field 
case.
+  // For multiple field case, we cannot directly move field extractor 
into
+  // the generator expression. A workaround is to re-construct array 
of struct
+  // from multiple fields. But it will be more complicated and may not 
worth.
+  if (nestedFieldsOnGenerator.size == 1) {
+// Only one nested column accessor.
+// E.g., df.select(explode($"items").as("item")).select($"item.a")
+pushedThrough match {
+  case p @ Project(_, newG: Generate) =>
+// Replace the child expression of `ExplodeBase` generator with
+// nested column accessor.
+// E.g., df.select(explode($"items").as("item")) =>
+//   df.select(explode($"items.a").as("item"))

Review comment:
   oh, I miss the nested column accessor on top of it. So it looks not 
correct. I will update the comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #31966: [SPARK-34638][SQL] Single field nested column prune on generator output

2021-03-28 Thread GitBox


viirya commented on a change in pull request #31966:
URL: https://github.com/apache/spark/pull/31966#discussion_r602997773



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -241,12 +262,69 @@ object GeneratorNestedColumnAliasing {
   // On top on `Generate`, a `Project` that might have nested column 
accessors.
   // We try to get alias maps for both project list and generator's 
children expressions.
   val exprsToPrune = projectList ++ g.generator.children
-  NestedColumnAliasing.getAliasSubMap(exprsToPrune, 
g.qualifiedGeneratorOutput).map {
+  NestedColumnAliasing.getAliasSubMap(exprsToPrune).map {
 case (nestedFieldToAlias, attrToAliases) =>
   // Defer updating `Generate.unrequiredChildIndex` to next round of 
`ColumnPruning`.
-  val newChild =
-NestedColumnAliasing.replaceWithAliases(g, nestedFieldToAlias, 
attrToAliases)
-  Project(NestedColumnAliasing.getNewProjectList(projectList, 
nestedFieldToAlias), newChild)
+
+  val (nestedFieldsOnGenerator, nestedFieldsNotOnGenerator) =
+nestedFieldOnGeneratorOutput(nestedFieldToAlias, 
g.qualifiedGeneratorOutput)
+  val (attrToAliasesOnGenerator, attrToAliasesNotOnGenerator) =
+aliasesOnGeneratorOutput(attrToAliases, g.qualifiedGeneratorOutput)
+
+  // Push nested column accessors through `Generator`. We cannot prune 
on `Generator`'s
+  // output.
+  val newChild = NestedColumnAliasing.replaceWithAliases(g,
+nestedFieldsNotOnGenerator, attrToAliasesNotOnGenerator)
+  val pushedThrough = Project(NestedColumnAliasing
+.getNewProjectList(projectList, nestedFieldsNotOnGenerator), 
newChild)
+
+  // Pruning on `Generator`'s output. We only process single field 
case.
+  // For multiple field case, we cannot directly move field extractor 
into
+  // the generator expression. A workaround is to re-construct array 
of struct
+  // from multiple fields. But it will be more complicated and may not 
worth.
+  if (nestedFieldsOnGenerator.size == 1) {
+// Only one nested column accessor.
+// E.g., df.select(explode($"items").as("item")).select($"item.a")
+pushedThrough match {
+  case p @ Project(_, newG: Generate) =>
+// Replace the child expression of `ExplodeBase` generator with
+// nested column accessor.
+// E.g., df.select(explode($"items").as("item")) =>
+//   df.select(explode($"items.a").as("item"))
+val rewrittenG = newG.transformExpressions {
+  case e: ExplodeBase =>
+val extractor = 
nestedFieldsOnGenerator.head._1.transformUp {
+  case _: Attribute =>
+e.child
+  case g: GetStructField =>
+ExtractValue(g.child, Literal(g.extractFieldName), 
SQLConf.get.resolver)

Review comment:
   let me add one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #31985: [SPARK-34860][ML] Multinomial Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


zhengruifeng commented on pull request #31985:
URL: https://github.com/apache/spark/pull/31985#issuecomment-809044289


   @srowen @WeichenXu123 
   This is the last PR for LR supporting centering


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #31985: [SPARK-34860][ML] Multinomial Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


zhengruifeng commented on a change in pull request #31985:
URL: https://github.com/apache/spark/pull/31985#discussion_r602997003



##
File path: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
##
@@ -1863,21 +1899,125 @@ class LogisticRegressionSuite extends MLTest with 
DefaultReadWriteTest {
   0.0, 0.0, 0.0, 0.09064661,
   -0.1144333, 0.3204703, -0.1621061, -0.2308192,
   0.0, -0.4832131, 0.0, 0.0), isTransposed = true)
-val interceptsRStd = Vectors.dense(-0.72638218, -0.01737265, 0.74375484)
+val interceptsRStd = Vectors.dense(-0.69265374, -0.2260274, 0.9186811)
 val coefficientsR = new DenseMatrix(3, 4, Array(
   0.0, 0.0, 0.01641412, 0.03570376,
   -0.05110822, 0.0, -0.21595670, -0.16162836,
   0.0, 0.0, 0.0, 0.0), isTransposed = true)
 val interceptsR = Vectors.dense(-0.44707756, 0.75180900, -0.3047314)
 
-assert(model1.coefficientMatrix ~== coefficientsRStd absTol 0.05)
-assert(model1.interceptVector ~== interceptsRStd relTol 0.1)
+assert(model1.coefficientMatrix ~== coefficientsRStd absTol 1e-3)
+assert(model1.interceptVector ~== interceptsRStd relTol 1e-3)
 assert(model1.interceptVector.toArray.sum ~== 0.0 absTol eps)
-assert(model2.coefficientMatrix ~== coefficientsR absTol 0.02)
-assert(model2.interceptVector ~== interceptsR relTol 0.1)
+assert(model2.coefficientMatrix ~== coefficientsR absTol 1e-3)
+assert(model2.interceptVector ~== interceptsR relTol 1e-3)
 assert(model2.interceptVector.toArray.sum ~== 0.0 absTol eps)
   }
 
+  test("SPARK-34860: multinomial logistic regression with intercept, with 
small var") {

Review comment:
   master does not pass this newly add testsuite:
   
   ```
   // scalastyle:off println
   println("R")
   println(interceptsR)
   println(coefficientsR)
   
   println()
   println("model1")
   println(model1.interceptVector)
   println(model1.coefficientMatrix)
   
   println()
   println("model2")
   println(model2.interceptVector)
   println(model2.coefficientMatrix)
   
   println()
   println("R2")
   println(interceptsR2)
   println(coefficientsR2)
   
   println()
   println("model3")
   println(model3.interceptVector)
   println(model3.coefficientMatrix)
   // scalastyle:on println
   ```
   
   
   
   
   this PR:
   ```
   R
   [2.91748298,-17.510746,14.59326301]
   0.21755977  0.01647541   0.16507778  -0.1401668   
   -0.244360.7564655-0.2955698  1.3262009
   0.02680026  -0.77294095  0.13049206  -1.18603411  
   model1
   [2.933958199942738,-17.543164024163175,14.609205824220437]
   0.21812136899052606   0.015486127035160564  0.16560717317181253  
-0.14189621394905397  
   -0.2454895541210769   0.7584152697648037-0.2966285999752721  
1.3296192946128171
   0.027368185130550855  -0.7739013967999642   0.13102142680345957  
-1.187723080663763
   model2
   [2.933958199942738,-17.543164024163175,14.609205824220437]
   0.21812136899052606   0.015486127035160564  0.16560717317181253  
-0.14189621394905397  
   -0.2454895541210769   0.7584152697648037-0.2966285999752721  
1.3296192946128171
   0.027368185130550855  -0.7739013967999642   0.13102142680345957  
-1.187723080663763
   
   R2
   [1.751626027,-3.9297124987,2.178086472]
   0.019970169   0.079611293   0.003959452   0.110024399   
   -4.788494E-4  0.0010097453  -5.832701E-4  0.0   
   -0.01936999   -0.080851149  -0.003319687  -0.112435972  
   model3
   [1.7516587309368687,-3.9297178332916585,2.1780591023547897]
   0.0199685439000646050.079604564245496850.0039592584764418055   
0.11002491382872195  
   -4.7805989516075794E-4  0.0010124410611496804  -5.830912612961964E-4   0.0   
   
   -0.01936890596857533-0.08084716280475213   -0.0033195486718121834  
-0.1124344396230352  
   ```
   
   
   
   
   
   master:
   ```
   R
   [2.91748298,-17.510746,14.59326301]
   0.21755977  0.01647541   0.16507778  -0.1401668   
   -0.244360.7564655-0.2955698  1.3262009
   0.02680026  -0.77294095  0.13049206  -1.18603411  
   model1
   [3.2289115796175536,-3.8874667667006286,0.6585551870830749]
   0.21614280080869921   0.010853354751576538  0.16526956599746928  
-0.16826299113708829  
   -0.24226138413980347  0.766137782321547 -0.2961105375461299  
-0.01353727702893284  
   0.02611858333110428   -0.7769911370731234   0.13084097154866067  
0.18180026816602116   
   model2
   [3.2289115795385817,-3.8874667667014213,0.65855518716284]
   0.216142800347921 0.01085335149421333  0.1652695665789533   
-0.16826299025797364   
   -0.24226138429694594  0.7661377826486023   -0.2961105377075671  
-0.013537276769415511  
   0.026118583949024932  -0.7769911341428156  0.13084097112861381  
0.18180026702738916
   
   
   R2
   [1.751626027,-3.9297124987,2.178086472]
   0.019970169   0.079611293   0.003959452   0.110024399  

[GitHub] [spark] zhengruifeng commented on a change in pull request #31985: [SPARK-34860][ML] Multinomial Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


zhengruifeng commented on a change in pull request #31985:
URL: https://github.com/apache/spark/pull/31985#discussion_r602996562



##
File path: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
##
@@ -1863,21 +1899,125 @@ class LogisticRegressionSuite extends MLTest with 
DefaultReadWriteTest {
   0.0, 0.0, 0.0, 0.09064661,
   -0.1144333, 0.3204703, -0.1621061, -0.2308192,
   0.0, -0.4832131, 0.0, 0.0), isTransposed = true)
-val interceptsRStd = Vectors.dense(-0.72638218, -0.01737265, 0.74375484)
+val interceptsRStd = Vectors.dense(-0.69265374, -0.2260274, 0.9186811)

Review comment:
   Old `interceptsRStd` did not equal to GLMNET's result: [-0.69265374, 
-0.2260274, 0.9186811], so I think this should be a good change




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31985: [SPARK-34860][ML] Multinomial Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


SparkQA commented on pull request #31985:
URL: https://github.com/apache/spark/pull/31985#issuecomment-809042739


   **[Test build #136625 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136625/testReport)**
 for PR 31985 at commit 
[`cdaafc2`](https://github.com/apache/spark/commit/cdaafc28f458d45a6f1a257b2cea381db7a09637).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31983: [SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates

2021-03-28 Thread GitBox


SparkQA commented on pull request #31983:
URL: https://github.com/apache/spark/pull/31983#issuecomment-809042768


   **[Test build #136626 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136626/testReport)**
 for PR 31983 at commit 
[`1b94589`](https://github.com/apache/spark/commit/1b94589fa35d39ccae7e5e16aee3fd7fe8cc81dd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request #31985: [SPARK-34860][ML] Multinomial Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


zhengruifeng opened a new pull request #31985:
URL: https://github.com/apache/spark/pull/31985


   ### What changes were proposed in this pull request?
   1, use new `MultinomialLogisticBlockAggregator` which support virtual 
centering
   2, remove no-used `BlockLogisticAggregator`
   
   
   ### Why are the changes needed?
   1, for better convergence;
   2, its solution is much close to GLMNET;
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   updated and new test suites
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


SparkQA commented on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809041822


   **[Test build #136624 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136624/testReport)**
 for PR 31979 at commit 
[`b368584`](https://github.com/apache/spark/commit/b368584c123dbbaf1fc3a1d6ca6902c097728192).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-28 Thread GitBox


SparkQA commented on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809041800


   **[Test build #136623 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136623/testReport)**
 for PR 31984 at commit 
[`68ddc7a`](https://github.com/apache/spark/commit/68ddc7a3a328705ff266a301966db4efef3d7528).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


AmplabJenkins removed a comment on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809041420






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


AmplabJenkins commented on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809041420






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-28 Thread GitBox


HyukjinKwon commented on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809040667


   cc @maryannxue FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


SparkQA commented on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809039750


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41203/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31859: [SPARK-34769][SQL]AnsiTypeCoercion: return closest convertible type among TypeCollection

2021-03-28 Thread GitBox


HyukjinKwon commented on pull request #31859:
URL: https://github.com/apache/spark/pull/31859#issuecomment-809038253


   I just found out that I mistakenly assigned it to myself .. I removed it 
back now ..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


SparkQA commented on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809037801


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41204/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31979: [SPARK-34879][SQL] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

2021-03-28 Thread GitBox


SparkQA commented on pull request #31979:
URL: https://github.com/apache/spark/pull/31979#issuecomment-809037424


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41203/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #31942: [SPARK-34834][NETWORK] Fix a potential Netty memory leak in TransportResponseHandler.

2021-03-28 Thread GitBox


Ngone51 commented on pull request #31942:
URL: https://github.com/apache/spark/pull/31942#issuecomment-809035511


   I'm also confused with this part. I don't even see a place where the 
`resp.body()` (a.k.a `ManagedBuffer`) is referenced before the 
`TransportResponseHandler` handle the `ResponseMessage`.
   
   And in the case of `ChunkFetchSuccess`, I wonder we may release the buffer 
here too early since the `listener.onSuccess(...)` is executed asynchronously:
   
   
https://github.com/apache/spark/blob/4b9e94c44412f399ba19e0ea90525d346942bf71/common/network-common/src/main/java/org/apache/spark/network/client/TransportResponseHandler.java#L162-L173
   
   Another possible issue is, the buffer returned by `ChunkFetchSuccess` is 
supposed to be released after the data has been consumed:
   
https://github.com/apache/spark/blob/2356cdd420f600f38d0e786dc50c15f2603b7ff2/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala#L257-L259
   
   but seems like we now only release the buffer when exception throws during 
buffer reading. And for a normally consumed buffer, we seem to forget to 
release it.
   
   cc @mridulm @tgravescs @attilapiros Do you have any idea?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #31693: [SPARK-34858][SPARK-34448][ML] Binary Logistic Regression with intercept support centering

2021-03-28 Thread GitBox


zhengruifeng commented on pull request #31693:
URL: https://github.com/apache/spark/pull/31693#issuecomment-809034717


   @srowen Thanks for reviewing and merging! I will send another PR for 
multinominal LR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31976: [SPARK-34814][SQL] LikeSimplification should handle NULL

2021-03-28 Thread GitBox


HyukjinKwon commented on pull request #31976:
URL: https://github.com/apache/spark/pull/31976#issuecomment-809033091


   cc @beliefer too FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #31976: [SPARK-34814][SQL] LikeSimplification should handle NULL

2021-03-28 Thread GitBox


HyukjinKwon closed pull request #31976:
URL: https://github.com/apache/spark/pull/31976


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31976: [SPARK-34814][SQL] LikeSimplification should handle NULL

2021-03-28 Thread GitBox


HyukjinKwon commented on pull request #31976:
URL: https://github.com/apache/spark/pull/31976#issuecomment-809032964


   Merged to master and branch-3.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #31973: [SPARK-34876][SQL] Fill defaultResult of non-nullable aggregates

2021-03-28 Thread GitBox


HyukjinKwon closed pull request #31973:
URL: https://github.com/apache/spark/pull/31973


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2021-03-28 Thread GitBox


AngersZh commented on pull request #30212:
URL: https://github.com/apache/spark/pull/30212#issuecomment-809028142


   Gentle ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31973: [SPARK-34876][SQL] Fill defaultResult of non-nullable aggregates

2021-03-28 Thread GitBox


HyukjinKwon commented on pull request #31973:
URL: https://github.com/apache/spark/pull/31973#issuecomment-809028090


   Merged to master, branch-3.1, branch-3.0 and branch-2.4
   
   cc @cloud-fan, @maryannxue, @viirya FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >