date:20200131

[GitHub] [spark] zhengruifeng closed pull request #27427: [SPARK-30700][ML] NaiveBayesModel predict optimization

2020-01-31 Thread GitBox

zhengruifeng closed pull request #27427: [SPARK-30700][ML] NaiveBayesModel 
predict optimization
URL: https://github.com/apache/spark/pull/27427
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27426: [SPARK-30698][BUILD] Bumps 
checkstyle from 8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-581002857
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117710/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27426: [SPARK-30698][BUILD] Bumps 
checkstyle from 8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-581002848
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle 
from 8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-581002857
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117710/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle 
from 8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-581002848
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

SparkQA removed a comment on issue #27426: [SPARK-30698][BUILD] Bumps 
checkstyle from 8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-580991397
 
 
   **[Test build #117710 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117710/testReport)**
 for PR 27426 at commit 
[`a54d262`](https://github.com/apache/spark/commit/a54d2629bccea6c8cc18006fcdd2142c820603b9).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27425: [SPARK-29543][SS][FOLLOWUP] Move `spark.sql.streaming.ui.*` configs to StaticSQLConf

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27425: [SPARK-29543][SS][FOLLOWUP] 
Move `spark.sql.streaming.ui.*` configs to StaticSQLConf
URL: https://github.com/apache/spark/pull/27425#issuecomment-581002668
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117709/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

SparkQA commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 
8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-581002709
 
 
   **[Test build #117710 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117710/testReport)**
 for PR 27426 at commit 
[`a54d262`](https://github.com/apache/spark/commit/a54d2629bccea6c8cc18006fcdd2142c820603b9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27425: [SPARK-29543][SS][FOLLOWUP] Move `spark.sql.streaming.ui.*` configs to StaticSQLConf

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27425: [SPARK-29543][SS][FOLLOWUP] 
Move `spark.sql.streaming.ui.*` configs to StaticSQLConf
URL: https://github.com/apache/spark/pull/27425#issuecomment-581002667
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27425: [SPARK-29543][SS][FOLLOWUP] Move `spark.sql.streaming.ui.*` configs to StaticSQLConf

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27425: [SPARK-29543][SS][FOLLOWUP] Move 
`spark.sql.streaming.ui.*` configs to StaticSQLConf
URL: https://github.com/apache/spark/pull/27425#issuecomment-581002668
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117709/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27425: [SPARK-29543][SS][FOLLOWUP] Move `spark.sql.streaming.ui.*` configs to StaticSQLConf

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27425: [SPARK-29543][SS][FOLLOWUP] Move 
`spark.sql.streaming.ui.*` configs to StaticSQLConf
URL: https://github.com/apache/spark/pull/27425#issuecomment-581002667
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27425: [SPARK-29543][SS][FOLLOWUP] Move `spark.sql.streaming.ui.*` configs to StaticSQLConf

2020-01-31 Thread GitBox

SparkQA removed a comment on issue #27425: [SPARK-29543][SS][FOLLOWUP] Move 
`spark.sql.streaming.ui.*` configs to StaticSQLConf
URL: https://github.com/apache/spark/pull/27425#issuecomment-580981804
 
 
   **[Test build #117709 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117709/testReport)**
 for PR 27425 at commit 
[`e4c3b38`](https://github.com/apache/spark/commit/e4c3b388f8be051f3ef619f08a636975d1156d0f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27425: [SPARK-29543][SS][FOLLOWUP] Move `spark.sql.streaming.ui.*` configs to StaticSQLConf

2020-01-31 Thread GitBox

SparkQA commented on issue #27425: [SPARK-29543][SS][FOLLOWUP] Move 
`spark.sql.streaming.ui.*` configs to StaticSQLConf
URL: https://github.com/apache/spark/pull/27425#issuecomment-581002533
 
 
   **[Test build #117709 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117709/testReport)**
 for PR 27425 at commit 
[`e4c3b38`](https://github.com/apache/spark/commit/e4c3b388f8be051f3ef619f08a636975d1156d0f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27427: [SPARK-30700][ML] NaiveBayesModel predict optimization

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27427: [SPARK-30700][ML] 
NaiveBayesModel predict optimization
URL: https://github.com/apache/spark/pull/27427#issuecomment-581001696
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117711/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27427: [SPARK-30700][ML] NaiveBayesModel predict optimization

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27427: [SPARK-30700][ML] 
NaiveBayesModel predict optimization
URL: https://github.com/apache/spark/pull/27427#issuecomment-581001695
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27427: [SPARK-30700][ML] NaiveBayesModel predict optimization

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27427: [SPARK-30700][ML] NaiveBayesModel 
predict optimization
URL: https://github.com/apache/spark/pull/27427#issuecomment-581001695
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27427: [SPARK-30700][ML] NaiveBayesModel predict optimization

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27427: [SPARK-30700][ML] NaiveBayesModel 
predict optimization
URL: https://github.com/apache/spark/pull/27427#issuecomment-581001696
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117711/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27427: [SPARK-30700][ML] NaiveBayesModel predict optimization

2020-01-31 Thread GitBox

SparkQA removed a comment on issue #27427: [SPARK-30700][ML] NaiveBayesModel 
predict optimization
URL: https://github.com/apache/spark/pull/27427#issuecomment-580997146
 
 
   **[Test build #117711 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117711/testReport)**
 for PR 27427 at commit 
[`7bee0b0`](https://github.com/apache/spark/commit/7bee0b03f030f108f4db1b2b54daa7b4238e027e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27427: [SPARK-30700][ML] NaiveBayesModel predict optimization

2020-01-31 Thread GitBox

SparkQA commented on issue #27427: [SPARK-30700][ML] NaiveBayesModel predict 
optimization
URL: https://github.com/apache/spark/pull/27427#issuecomment-581001642
 
 
   **[Test build #117711 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117711/testReport)**
 for PR 27427 at commit 
[`7bee0b0`](https://github.com/apache/spark/commit/7bee0b03f030f108f4db1b2b54daa7b4238e027e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-01-31 Thread GitBox

dongjoon-hyun commented on issue #27366: [SPARK-30648][SQL] Support filters 
pushdown in JSON datasource
URL: https://github.com/apache/spark/pull/27366#issuecomment-581001372
 
 
   Okay. Then, let's talk later for this PR. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun removed a comment on issue #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-01-31 Thread GitBox

dongjoon-hyun removed a comment on issue #27366: [SPARK-30648][SQL] Support 
filters pushdown in JSON datasource
URL: https://github.com/apache/spark/pull/27366#issuecomment-580969052
 
 
   Hi, @MaxGekk . Could you update once more when you have a chance?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on issue #27322: [SPARK-26111][ML][WIP] Support F-value between label/feature for continuous distribution feature selection

2020-01-31 Thread GitBox

zhengruifeng commented on issue #27322: [SPARK-26111][ML][WIP] Support F-value 
between label/feature for continuous distribution feature selection
URL: https://github.com/apache/spark/pull/27322#issuecomment-581001211
 
 
   > Currently, this WIP PR only has FValueRegressionSelector implemented. 
FValueClassificationSelector is very similar. The calculation for 
classification f value is a little more complicated.
   
   I think `f_classif` is different enough for another PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #27424: [SPARK-29138][PYTHON][TEST] Increase timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy

2020-01-31 Thread GitBox

dongjoon-hyun commented on issue #27424: [SPARK-29138][PYTHON][TEST] Increase 
timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy
URL: https://github.com/apache/spark/pull/27424#issuecomment-581001148
 
 
   Thank you, @HyukjinKwon !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #27322: [SPARK-26111][ML][WIP] Support F-value between label/feature for continuous distribution feature selection

2020-01-31 Thread GitBox

zhengruifeng commented on a change in pull request #27322: 
[SPARK-26111][ML][WIP] Support F-value between label/feature for continuous 
distribution feature selection
URL: https://github.com/apache/spark/pull/27322#discussion_r373762412
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/feature/FRegressionSelector.scala
 ##
 @@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import scala.collection.mutable.ArrayBuilder
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml._
+import org.apache.spark.ml.attribute.{AttributeGroup, _}
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.stat.FRegressionTest
+import org.apache.spark.ml.util._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{DoubleType, StructField, StructType}
+
+
+/**
+ * Params for [[FRegressionSelector]] and [[FRegressionSelectorModel]].
+ * TODO: put all these params in shared.scala
+ * TODO: Not include fdr and fwe for now. Need to check if these two are 
applicable!!!
+ */
+private[feature] trait FRegressionSelectorParams extends Params
+  with HasFeaturesCol with HasOutputCol with HasLabelCol {
+
+  /**
+   * Number of features that selector will select, ordered by ascending 
p-value. If the
+   * number of features is less than numTopFeatures, then this will select all 
features.
+   * Only applicable when selectorType = "numTopFeatures".
+   * The default value of numTopFeatures is 50.
+   *
+   * @group param
+   */
+  @Since("3.1.0")
+  final val numTopFeatures = new IntParam(this, "numTopFeatures",
+"Number of features that selector will select, ordered by ascending 
p-value. If the" +
+  " number of features is < numTopFeatures, then this will select all 
features.",
+ParamValidators.gtEq(1))
+  setDefault(numTopFeatures -> 50)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getNumTopFeatures: Int = $(numTopFeatures)
+
+  /**
+   * Percentile of features that selector will select, ordered by statistics 
value descending.
+   * Only applicable when selectorType = "percentile".
+   * Default value is 0.1.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val percentile = new DoubleParam(this, "percentile",
+"Percentile of features that selector will select, ordered by ascending 
p-value.",
+ParamValidators.inRange(0, 1))
+  setDefault(percentile -> 0.1)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getPercentile: Double = $(percentile)
+
+  /**
+   * The highest p-value for features to be kept.
+   * Only applicable when selectorType = "fpr".
+   * Default value is 0.05.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val fpr = new DoubleParam(this, "fpr", "The highest p-value for 
features to be kept.",
+ParamValidators.inRange(0, 1))
+  setDefault(fpr -> 0.05)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getFpr: Double = $(fpr)
+
+  /**
+   * The selector type of the FRegressionSelector.
+   * Supported options: "numTopFeatures" (default), "percentile", "fpr".
+   * @group param
+   */
+  @Since("3.1.0")
+  final val selectorType = new Param[String](this, "selectorType",
+"The selector type of the FRegressionSelector. " +
+  "Supported options: numTopFeatures, percentile, fpr")
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getSelectorType: String = $(selectorType)
+}
+
+/**
+ * Regression F-value Selector
+ * This feature selector is for regressions where features are continuous and 
labels are continuous.
+ * ANOVA F-value Classification Selector is for when features are continuous 
and labels are
+ * categorical.
+ * Currently, Chi-Squared is for categorical features and categorical labels
+ * The selector supports different selection methods: `numTopFeatures`, 
`percentile`, `fpr`
+ *  - `numTopFeatures` chooses a fixed number of top features according to a 
fRegression test.
+ *  - `percentile` is similar but chooses a fraction of all features instead 
of a fixed number.
+ *  - `fpr` c

[GitHub] [spark] zhengruifeng commented on a change in pull request #27322: [SPARK-26111][ML][WIP] Support F-value between label/feature for continuous distribution feature selection

2020-01-31 Thread GitBox

zhengruifeng commented on a change in pull request #27322: 
[SPARK-26111][ML][WIP] Support F-value between label/feature for continuous 
distribution feature selection
URL: https://github.com/apache/spark/pull/27322#discussion_r373762346
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/stat/FRegressionTest.scala
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.stat
+
+import org.apache.commons.math3.distribution.FDistribution
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml.feature.LabeledPoint
+import org.apache.spark.ml.linalg.{Vector, VectorUDT}
+import org.apache.spark.ml.util.SchemaUtils
+import org.apache.spark.mllib.stat.{Statistics => OldStatistics}
+import org.apache.spark.sql.Dataset
+import org.apache.spark.sql.functions.col
+
+
+/**
+ * F-Regression Test
+ */
+@Since("3.1.0")
+object FRegressionTest {
+
+  case class FRegressionTestResult(
+  pValue: Double,
+  degreesOfFreedom: Int,
+  fValue: Double)
+
+  /**
+   * @param dataset  DataFrame of continuous labels and continuous features.
+   * @param featuresCol  Name of features column in dataset, of type `Vector` 
(`VectorUDT`)
+   * @param labelCol  Name of label column in dataset, of any numerical type
+   * @return Array containing the FRegressionTestResult for every feature 
against the label.
+   */
+  @Since("3.1.0")
+  def test_regression(dataset: Dataset[_], featuresCol: String, labelCol: 
String):
 
 Review comment:
   this method name `test_regression` should follow Camel-Case


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #27322: [SPARK-26111][ML][WIP] Support F-value between label/feature for continuous distribution feature selection

2020-01-31 Thread GitBox

zhengruifeng commented on a change in pull request #27322: 
[SPARK-26111][ML][WIP] Support F-value between label/feature for continuous 
distribution feature selection
URL: https://github.com/apache/spark/pull/27322#discussion_r373761706
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/feature/FRegressionSelector.scala
 ##
 @@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import scala.collection.mutable.ArrayBuilder
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml._
+import org.apache.spark.ml.attribute.{AttributeGroup, _}
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.stat.FRegressionTest
+import org.apache.spark.ml.util._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{DoubleType, StructField, StructType}
+
+
+/**
+ * Params for [[FRegressionSelector]] and [[FRegressionSelectorModel]].
+ * TODO: put all these params in shared.scala
+ * TODO: Not include fdr and fwe for now. Need to check if these two are 
applicable!!!
+ */
+private[feature] trait FRegressionSelectorParams extends Params
+  with HasFeaturesCol with HasOutputCol with HasLabelCol {
+
+  /**
+   * Number of features that selector will select, ordered by ascending 
p-value. If the
+   * number of features is less than numTopFeatures, then this will select all 
features.
+   * Only applicable when selectorType = "numTopFeatures".
+   * The default value of numTopFeatures is 50.
+   *
+   * @group param
+   */
+  @Since("3.1.0")
+  final val numTopFeatures = new IntParam(this, "numTopFeatures",
+"Number of features that selector will select, ordered by ascending 
p-value. If the" +
+  " number of features is < numTopFeatures, then this will select all 
features.",
+ParamValidators.gtEq(1))
+  setDefault(numTopFeatures -> 50)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getNumTopFeatures: Int = $(numTopFeatures)
+
+  /**
+   * Percentile of features that selector will select, ordered by statistics 
value descending.
+   * Only applicable when selectorType = "percentile".
+   * Default value is 0.1.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val percentile = new DoubleParam(this, "percentile",
+"Percentile of features that selector will select, ordered by ascending 
p-value.",
+ParamValidators.inRange(0, 1))
+  setDefault(percentile -> 0.1)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getPercentile: Double = $(percentile)
+
+  /**
+   * The highest p-value for features to be kept.
+   * Only applicable when selectorType = "fpr".
+   * Default value is 0.05.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val fpr = new DoubleParam(this, "fpr", "The highest p-value for 
features to be kept.",
+ParamValidators.inRange(0, 1))
+  setDefault(fpr -> 0.05)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getFpr: Double = $(fpr)
+
+  /**
+   * The selector type of the FRegressionSelector.
+   * Supported options: "numTopFeatures" (default), "percentile", "fpr".
+   * @group param
+   */
+  @Since("3.1.0")
+  final val selectorType = new Param[String](this, "selectorType",
+"The selector type of the FRegressionSelector. " +
+  "Supported options: numTopFeatures, percentile, fpr")
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getSelectorType: String = $(selectorType)
+}
+
+/**
+ * Regression F-value Selector
+ * This feature selector is for regressions where features are continuous and 
labels are continuous.
+ * ANOVA F-value Classification Selector is for when features are continuous 
and labels are
+ * categorical.
+ * Currently, Chi-Squared is for categorical features and categorical labels
+ * The selector supports different selection methods: `numTopFeatures`, 
`percentile`, `fpr`
+ *  - `numTopFeatures` chooses a fixed number of top features according to a 
fRegression test.
+ *  - `percentile` is similar but chooses a fraction of all features instead 
of a fixed number.
+ *  - `fpr` c

[GitHub] [spark] zhengruifeng commented on a change in pull request #27322: [SPARK-26111][ML][WIP] Support F-value between label/feature for continuous distribution feature selection

2020-01-31 Thread GitBox

zhengruifeng commented on a change in pull request #27322: 
[SPARK-26111][ML][WIP] Support F-value between label/feature for continuous 
distribution feature selection
URL: https://github.com/apache/spark/pull/27322#discussion_r373762504
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/feature/FRegressionSelector.scala
 ##
 @@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import scala.collection.mutable.ArrayBuilder
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml._
+import org.apache.spark.ml.attribute.{AttributeGroup, _}
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.stat.FRegressionTest
+import org.apache.spark.ml.util._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{DoubleType, StructField, StructType}
+
+
+/**
+ * Params for [[FRegressionSelector]] and [[FRegressionSelectorModel]].
+ * TODO: put all these params in shared.scala
+ * TODO: Not include fdr and fwe for now. Need to check if these two are 
applicable!!!
+ */
+private[feature] trait FRegressionSelectorParams extends Params
+  with HasFeaturesCol with HasOutputCol with HasLabelCol {
+
+  /**
+   * Number of features that selector will select, ordered by ascending 
p-value. If the
+   * number of features is less than numTopFeatures, then this will select all 
features.
+   * Only applicable when selectorType = "numTopFeatures".
+   * The default value of numTopFeatures is 50.
+   *
+   * @group param
+   */
+  @Since("3.1.0")
+  final val numTopFeatures = new IntParam(this, "numTopFeatures",
+"Number of features that selector will select, ordered by ascending 
p-value. If the" +
+  " number of features is < numTopFeatures, then this will select all 
features.",
+ParamValidators.gtEq(1))
+  setDefault(numTopFeatures -> 50)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getNumTopFeatures: Int = $(numTopFeatures)
+
+  /**
+   * Percentile of features that selector will select, ordered by statistics 
value descending.
+   * Only applicable when selectorType = "percentile".
+   * Default value is 0.1.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val percentile = new DoubleParam(this, "percentile",
+"Percentile of features that selector will select, ordered by ascending 
p-value.",
+ParamValidators.inRange(0, 1))
+  setDefault(percentile -> 0.1)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getPercentile: Double = $(percentile)
+
+  /**
+   * The highest p-value for features to be kept.
+   * Only applicable when selectorType = "fpr".
+   * Default value is 0.05.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val fpr = new DoubleParam(this, "fpr", "The highest p-value for 
features to be kept.",
+ParamValidators.inRange(0, 1))
+  setDefault(fpr -> 0.05)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getFpr: Double = $(fpr)
+
+  /**
+   * The selector type of the FRegressionSelector.
+   * Supported options: "numTopFeatures" (default), "percentile", "fpr".
+   * @group param
+   */
+  @Since("3.1.0")
+  final val selectorType = new Param[String](this, "selectorType",
+"The selector type of the FRegressionSelector. " +
+  "Supported options: numTopFeatures, percentile, fpr")
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getSelectorType: String = $(selectorType)
+}
+
+/**
+ * Regression F-value Selector
+ * This feature selector is for regressions where features are continuous and 
labels are continuous.
+ * ANOVA F-value Classification Selector is for when features are continuous 
and labels are
+ * categorical.
+ * Currently, Chi-Squared is for categorical features and categorical labels
+ * The selector supports different selection methods: `numTopFeatures`, 
`percentile`, `fpr`
+ *  - `numTopFeatures` chooses a fixed number of top features according to a 
fRegression test.
+ *  - `percentile` is similar but chooses a fraction of all features instead 
of a fixed number.
+ *  - `fpr` c

[GitHub] [spark] zhengruifeng commented on a change in pull request #27322: [SPARK-26111][ML][WIP] Support F-value between label/feature for continuous distribution feature selection

2020-01-31 Thread GitBox

zhengruifeng commented on a change in pull request #27322: 
[SPARK-26111][ML][WIP] Support F-value between label/feature for continuous 
distribution feature selection
URL: https://github.com/apache/spark/pull/27322#discussion_r373762290
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/stat/FRegressionTest.scala
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.stat
+
+import org.apache.commons.math3.distribution.FDistribution
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml.feature.LabeledPoint
+import org.apache.spark.ml.linalg.{Vector, VectorUDT}
+import org.apache.spark.ml.util.SchemaUtils
+import org.apache.spark.mllib.stat.{Statistics => OldStatistics}
+import org.apache.spark.sql.Dataset
+import org.apache.spark.sql.functions.col
+
+
+/**
+ * F-Regression Test
+ */
+@Since("3.1.0")
+object FRegressionTest {
+
+  case class FRegressionTestResult(
+  pValue: Double,
+  degreesOfFreedom: Int,
+  fValue: Double)
+
+  /**
+   * @param dataset  DataFrame of continuous labels and continuous features.
+   * @param featuresCol  Name of features column in dataset, of type `Vector` 
(`VectorUDT`)
+   * @param labelCol  Name of label column in dataset, of any numerical type
+   * @return Array containing the FRegressionTestResult for every feature 
against the label.
+   */
+  @Since("3.1.0")
+  def test_regression(dataset: Dataset[_], featuresCol: String, labelCol: 
String):
+Array[FRegressionTestResult] = {
+
+val spark = dataset.sparkSession
+import spark.implicits._
+
+SchemaUtils.checkColumnType(dataset.schema, featuresCol, new VectorUDT)
+SchemaUtils.checkNumericType(dataset.schema, labelCol)
+val rdd = dataset.select(col(labelCol).cast("double"), 
col(featuresCol)).as[(Double, Vector)]
+  .rdd.map { case (label, features) => LabeledPoint(label, features) }
+
+val numOfFeatures = rdd.first().features.size
+val numOfSamples = rdd.count()
+val degreeOfFreedom = numOfSamples.toInt - 2
+
+var fTestResultArray = new Array[FRegressionTestResult](numOfFeatures)
+val labels = rdd.map(d => d.label)
+for (i <- 0 until numOfFeatures) {
 
 Review comment:
   compute each col at once?
   This should be inefficient, I guess only one pass is needed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #27322: [SPARK-26111][ML][WIP] Support F-value between label/feature for continuous distribution feature selection

2020-01-31 Thread GitBox

zhengruifeng commented on a change in pull request #27322: 
[SPARK-26111][ML][WIP] Support F-value between label/feature for continuous 
distribution feature selection
URL: https://github.com/apache/spark/pull/27322#discussion_r373761805
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/feature/FRegressionSelector.scala
 ##
 @@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import scala.collection.mutable.ArrayBuilder
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml._
+import org.apache.spark.ml.attribute.{AttributeGroup, _}
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.stat.FRegressionTest
+import org.apache.spark.ml.util._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{DoubleType, StructField, StructType}
+
+
+/**
+ * Params for [[FRegressionSelector]] and [[FRegressionSelectorModel]].
+ * TODO: put all these params in shared.scala
+ * TODO: Not include fdr and fwe for now. Need to check if these two are 
applicable!!!
+ */
+private[feature] trait FRegressionSelectorParams extends Params
+  with HasFeaturesCol with HasOutputCol with HasLabelCol {
+
+  /**
+   * Number of features that selector will select, ordered by ascending 
p-value. If the
+   * number of features is less than numTopFeatures, then this will select all 
features.
+   * Only applicable when selectorType = "numTopFeatures".
+   * The default value of numTopFeatures is 50.
+   *
+   * @group param
+   */
+  @Since("3.1.0")
+  final val numTopFeatures = new IntParam(this, "numTopFeatures",
+"Number of features that selector will select, ordered by ascending 
p-value. If the" +
+  " number of features is < numTopFeatures, then this will select all 
features.",
+ParamValidators.gtEq(1))
+  setDefault(numTopFeatures -> 50)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getNumTopFeatures: Int = $(numTopFeatures)
+
+  /**
+   * Percentile of features that selector will select, ordered by statistics 
value descending.
+   * Only applicable when selectorType = "percentile".
+   * Default value is 0.1.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val percentile = new DoubleParam(this, "percentile",
+"Percentile of features that selector will select, ordered by ascending 
p-value.",
+ParamValidators.inRange(0, 1))
+  setDefault(percentile -> 0.1)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getPercentile: Double = $(percentile)
+
+  /**
+   * The highest p-value for features to be kept.
+   * Only applicable when selectorType = "fpr".
+   * Default value is 0.05.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val fpr = new DoubleParam(this, "fpr", "The highest p-value for 
features to be kept.",
+ParamValidators.inRange(0, 1))
+  setDefault(fpr -> 0.05)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getFpr: Double = $(fpr)
+
+  /**
+   * The selector type of the FRegressionSelector.
+   * Supported options: "numTopFeatures" (default), "percentile", "fpr".
+   * @group param
+   */
+  @Since("3.1.0")
+  final val selectorType = new Param[String](this, "selectorType",
+"The selector type of the FRegressionSelector. " +
+  "Supported options: numTopFeatures, percentile, fpr")
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getSelectorType: String = $(selectorType)
+}
+
+/**
+ * Regression F-value Selector
+ * This feature selector is for regressions where features are continuous and 
labels are continuous.
+ * ANOVA F-value Classification Selector is for when features are continuous 
and labels are
+ * categorical.
+ * Currently, Chi-Squared is for categorical features and categorical labels
+ * The selector supports different selection methods: `numTopFeatures`, 
`percentile`, `fpr`
+ *  - `numTopFeatures` chooses a fixed number of top features according to a 
fRegression test.
+ *  - `percentile` is similar but chooses a fraction of all features instead 
of a fixed number.
+ *  - `fpr` c

[GitHub] [spark] zhengruifeng commented on a change in pull request #27322: [SPARK-26111][ML][WIP] Support F-value between label/feature for continuous distribution feature selection

2020-01-31 Thread GitBox

zhengruifeng commented on a change in pull request #27322: 
[SPARK-26111][ML][WIP] Support F-value between label/feature for continuous 
distribution feature selection
URL: https://github.com/apache/spark/pull/27322#discussion_r373762537
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/feature/FRegressionSelector.scala
 ##
 @@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import scala.collection.mutable.ArrayBuilder
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml._
+import org.apache.spark.ml.attribute.{AttributeGroup, _}
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.stat.FRegressionTest
+import org.apache.spark.ml.util._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{DoubleType, StructField, StructType}
+
+
+/**
+ * Params for [[FRegressionSelector]] and [[FRegressionSelectorModel]].
+ * TODO: put all these params in shared.scala
+ * TODO: Not include fdr and fwe for now. Need to check if these two are 
applicable!!!
+ */
+private[feature] trait FRegressionSelectorParams extends Params
+  with HasFeaturesCol with HasOutputCol with HasLabelCol {
+
+  /**
+   * Number of features that selector will select, ordered by ascending 
p-value. If the
+   * number of features is less than numTopFeatures, then this will select all 
features.
+   * Only applicable when selectorType = "numTopFeatures".
+   * The default value of numTopFeatures is 50.
+   *
+   * @group param
+   */
+  @Since("3.1.0")
+  final val numTopFeatures = new IntParam(this, "numTopFeatures",
+"Number of features that selector will select, ordered by ascending 
p-value. If the" +
+  " number of features is < numTopFeatures, then this will select all 
features.",
+ParamValidators.gtEq(1))
+  setDefault(numTopFeatures -> 50)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getNumTopFeatures: Int = $(numTopFeatures)
+
+  /**
+   * Percentile of features that selector will select, ordered by statistics 
value descending.
+   * Only applicable when selectorType = "percentile".
+   * Default value is 0.1.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val percentile = new DoubleParam(this, "percentile",
+"Percentile of features that selector will select, ordered by ascending 
p-value.",
+ParamValidators.inRange(0, 1))
+  setDefault(percentile -> 0.1)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getPercentile: Double = $(percentile)
+
+  /**
+   * The highest p-value for features to be kept.
+   * Only applicable when selectorType = "fpr".
+   * Default value is 0.05.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val fpr = new DoubleParam(this, "fpr", "The highest p-value for 
features to be kept.",
+ParamValidators.inRange(0, 1))
+  setDefault(fpr -> 0.05)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getFpr: Double = $(fpr)
+
+  /**
+   * The selector type of the FRegressionSelector.
+   * Supported options: "numTopFeatures" (default), "percentile", "fpr".
+   * @group param
+   */
+  @Since("3.1.0")
+  final val selectorType = new Param[String](this, "selectorType",
+"The selector type of the FRegressionSelector. " +
+  "Supported options: numTopFeatures, percentile, fpr")
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getSelectorType: String = $(selectorType)
+}
+
+/**
+ * Regression F-value Selector
+ * This feature selector is for regressions where features are continuous and 
labels are continuous.
+ * ANOVA F-value Classification Selector is for when features are continuous 
and labels are
+ * categorical.
+ * Currently, Chi-Squared is for categorical features and categorical labels
+ * The selector supports different selection methods: `numTopFeatures`, 
`percentile`, `fpr`
+ *  - `numTopFeatures` chooses a fixed number of top features according to a 
fRegression test.
+ *  - `percentile` is similar but chooses a fraction of all features instead 
of a fixed number.
+ *  - `fpr` c

[GitHub] [spark] zhengruifeng commented on a change in pull request #27322: [SPARK-26111][ML][WIP] Support F-value between label/feature for continuous distribution feature selection

2020-01-31 Thread GitBox

zhengruifeng commented on a change in pull request #27322: 
[SPARK-26111][ML][WIP] Support F-value between label/feature for continuous 
distribution feature selection
URL: https://github.com/apache/spark/pull/27322#discussion_r373762117
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/feature/FRegressionSelector.scala
 ##
 @@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import scala.collection.mutable.ArrayBuilder
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml._
+import org.apache.spark.ml.attribute.{AttributeGroup, _}
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.stat.FRegressionTest
+import org.apache.spark.ml.util._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{DoubleType, StructField, StructType}
+
+
+/**
+ * Params for [[FRegressionSelector]] and [[FRegressionSelectorModel]].
+ * TODO: put all these params in shared.scala
+ * TODO: Not include fdr and fwe for now. Need to check if these two are 
applicable!!!
+ */
+private[feature] trait FRegressionSelectorParams extends Params
+  with HasFeaturesCol with HasOutputCol with HasLabelCol {
+
+  /**
+   * Number of features that selector will select, ordered by ascending 
p-value. If the
+   * number of features is less than numTopFeatures, then this will select all 
features.
+   * Only applicable when selectorType = "numTopFeatures".
+   * The default value of numTopFeatures is 50.
+   *
+   * @group param
+   */
+  @Since("3.1.0")
+  final val numTopFeatures = new IntParam(this, "numTopFeatures",
+"Number of features that selector will select, ordered by ascending 
p-value. If the" +
+  " number of features is < numTopFeatures, then this will select all 
features.",
+ParamValidators.gtEq(1))
+  setDefault(numTopFeatures -> 50)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getNumTopFeatures: Int = $(numTopFeatures)
+
+  /**
+   * Percentile of features that selector will select, ordered by statistics 
value descending.
+   * Only applicable when selectorType = "percentile".
+   * Default value is 0.1.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val percentile = new DoubleParam(this, "percentile",
+"Percentile of features that selector will select, ordered by ascending 
p-value.",
+ParamValidators.inRange(0, 1))
+  setDefault(percentile -> 0.1)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getPercentile: Double = $(percentile)
+
+  /**
+   * The highest p-value for features to be kept.
+   * Only applicable when selectorType = "fpr".
+   * Default value is 0.05.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val fpr = new DoubleParam(this, "fpr", "The highest p-value for 
features to be kept.",
+ParamValidators.inRange(0, 1))
+  setDefault(fpr -> 0.05)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getFpr: Double = $(fpr)
+
+  /**
+   * The selector type of the FRegressionSelector.
+   * Supported options: "numTopFeatures" (default), "percentile", "fpr".
+   * @group param
+   */
+  @Since("3.1.0")
+  final val selectorType = new Param[String](this, "selectorType",
+"The selector type of the FRegressionSelector. " +
+  "Supported options: numTopFeatures, percentile, fpr")
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getSelectorType: String = $(selectorType)
+}
+
+/**
+ * Regression F-value Selector
+ * This feature selector is for regressions where features are continuous and 
labels are continuous.
+ * ANOVA F-value Classification Selector is for when features are continuous 
and labels are
+ * categorical.
+ * Currently, Chi-Squared is for categorical features and categorical labels
+ * The selector supports different selection methods: `numTopFeatures`, 
`percentile`, `fpr`
+ *  - `numTopFeatures` chooses a fixed number of top features according to a 
fRegression test.
+ *  - `percentile` is similar but chooses a fraction of all features instead 
of a fixed number.
+ *  - `fpr` c

[GitHub] [spark] zhengruifeng commented on a change in pull request #27322: [SPARK-26111][ML][WIP] Support F-value between label/feature for continuous distribution feature selection

2020-01-31 Thread GitBox

zhengruifeng commented on a change in pull request #27322: 
[SPARK-26111][ML][WIP] Support F-value between label/feature for continuous 
distribution feature selection
URL: https://github.com/apache/spark/pull/27322#discussion_r373761981
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/feature/FRegressionSelector.scala
 ##
 @@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import scala.collection.mutable.ArrayBuilder
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml._
+import org.apache.spark.ml.attribute.{AttributeGroup, _}
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.stat.FRegressionTest
+import org.apache.spark.ml.util._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{DoubleType, StructField, StructType}
+
+
+/**
+ * Params for [[FRegressionSelector]] and [[FRegressionSelectorModel]].
+ * TODO: put all these params in shared.scala
+ * TODO: Not include fdr and fwe for now. Need to check if these two are 
applicable!!!
+ */
+private[feature] trait FRegressionSelectorParams extends Params
+  with HasFeaturesCol with HasOutputCol with HasLabelCol {
+
+  /**
+   * Number of features that selector will select, ordered by ascending 
p-value. If the
+   * number of features is less than numTopFeatures, then this will select all 
features.
+   * Only applicable when selectorType = "numTopFeatures".
+   * The default value of numTopFeatures is 50.
+   *
+   * @group param
+   */
+  @Since("3.1.0")
+  final val numTopFeatures = new IntParam(this, "numTopFeatures",
+"Number of features that selector will select, ordered by ascending 
p-value. If the" +
+  " number of features is < numTopFeatures, then this will select all 
features.",
+ParamValidators.gtEq(1))
+  setDefault(numTopFeatures -> 50)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getNumTopFeatures: Int = $(numTopFeatures)
+
+  /**
+   * Percentile of features that selector will select, ordered by statistics 
value descending.
+   * Only applicable when selectorType = "percentile".
+   * Default value is 0.1.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val percentile = new DoubleParam(this, "percentile",
+"Percentile of features that selector will select, ordered by ascending 
p-value.",
+ParamValidators.inRange(0, 1))
+  setDefault(percentile -> 0.1)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getPercentile: Double = $(percentile)
+
+  /**
+   * The highest p-value for features to be kept.
+   * Only applicable when selectorType = "fpr".
+   * Default value is 0.05.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val fpr = new DoubleParam(this, "fpr", "The highest p-value for 
features to be kept.",
+ParamValidators.inRange(0, 1))
+  setDefault(fpr -> 0.05)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getFpr: Double = $(fpr)
+
+  /**
+   * The selector type of the FRegressionSelector.
+   * Supported options: "numTopFeatures" (default), "percentile", "fpr".
+   * @group param
+   */
+  @Since("3.1.0")
+  final val selectorType = new Param[String](this, "selectorType",
+"The selector type of the FRegressionSelector. " +
+  "Supported options: numTopFeatures, percentile, fpr")
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getSelectorType: String = $(selectorType)
+}
+
+/**
+ * Regression F-value Selector
+ * This feature selector is for regressions where features are continuous and 
labels are continuous.
+ * ANOVA F-value Classification Selector is for when features are continuous 
and labels are
+ * categorical.
+ * Currently, Chi-Squared is for categorical features and categorical labels
+ * The selector supports different selection methods: `numTopFeatures`, 
`percentile`, `fpr`
+ *  - `numTopFeatures` chooses a fixed number of top features according to a 
fRegression test.
+ *  - `percentile` is similar but chooses a fraction of all features instead 
of a fixed number.
+ *  - `fpr` c

[GitHub] [spark] nikunjb removed a comment on issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in ReducedWindo…

2020-01-31 Thread GitBox

nikunjb removed a comment on issue #22423: [SPARK-25302][STREAMING] Checkpoint 
the reducedStream in ReducedWindo…
URL: https://github.com/apache/spark/pull/22423#issuecomment-581000427
 
 
   Please review this PR and the related one for SPARK-25303 too. I am 
reopening both.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #27424: [SPARK-29138][PYTHON][TEST] Increase timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy

2020-01-31 Thread GitBox

HyukjinKwon closed pull request #27424: [SPARK-29138][PYTHON][TEST] Increase 
timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy
URL: https://github.com/apache/spark/pull/27424
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27424: [SPARK-29138][PYTHON][TEST] Increase timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy

2020-01-31 Thread GitBox

HyukjinKwon commented on issue #27424: [SPARK-29138][PYTHON][TEST] Increase 
timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy
URL: https://github.com/apache/spark/pull/27424#issuecomment-581000488
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon removed a comment on issue #27424: [SPARK-29138][PYTHON][TEST] Increase timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy

2020-01-31 Thread GitBox

HyukjinKwon removed a comment on issue #27424: [SPARK-29138][PYTHON][TEST] 
Increase timeout of 
StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy
URL: https://github.com/apache/spark/pull/27424#issuecomment-581000476
 
 
   Yeah, I think it's fine to increase and see if it actually fixes. I think it 
fixes. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27424: [SPARK-29138][PYTHON][TEST] Increase timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy

2020-01-31 Thread GitBox

HyukjinKwon commented on issue #27424: [SPARK-29138][PYTHON][TEST] Increase 
timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy
URL: https://github.com/apache/spark/pull/27424#issuecomment-581000468
 
 
   Yeah, I think it's fine to increase and see if it actually fixes. I think it 
fixes. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27424: [SPARK-29138][PYTHON][TEST] Increase timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy

2020-01-31 Thread GitBox

HyukjinKwon commented on issue #27424: [SPARK-29138][PYTHON][TEST] Increase 
timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy
URL: https://github.com/apache/spark/pull/27424#issuecomment-581000476
 
 
   Yeah, I think it's fine to increase and see if it actually fixes. I think it 
fixes. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nikunjb commented on issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in ReducedWindo…

2020-01-31 Thread GitBox

nikunjb commented on issue #22423: [SPARK-25302][STREAMING] Checkpoint the 
reducedStream in ReducedWindo…
URL: https://github.com/apache/spark/pull/22423#issuecomment-581000427
 
 
   Please review this PR and the related one for SPARK-25303 too. I am 
reopening both.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-01-31 Thread GitBox

MaxGekk commented on a change in pull request #27366: [SPARK-30648][SQL] 
Support filters pushdown in JSON datasource
URL: https://github.com/apache/spark/pull/27366#discussion_r373761854
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/StructFilters.scala
 ##
 @@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst
+
+import scala.util.Try
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.types.{BooleanType, StructType}
+
+/**
+ * The class provides API for applying pushed down filters to partially or
+ * fully set internal rows that have the struct schema.
+ *
+ * @param filters The pushed down source filters. The filters should refer to
+ *the fields of the provided schema.
+ * @param schema The required schema of records from datasource files.
+ */
+abstract class StructFilters(filters: Seq[sources.Filter], schema: StructType) 
{
+
+  assert(filters.forall(StructFilters.checkFilterRefs(_, schema)),
+"A pushed down filter refers to a non-existing schema field.")
+
+  /**
+   * Applies pushed down source filters to the given row assuming that
+   * value at `index` has been already set.
+   *
+   * @param row The row with fully or partially set values.
+   * @param index The index of already set value.
+   * @return true if currently processed row can be skipped otherwise false.
+   */
+  def skipRow(row: InternalRow, index: Int): Boolean
+
+  /**
+   * Resets states of pushed down filters. The method must be called before
+   * precessing any new row otherwise skipRow() may return wrong result.
+   */
+  def reset(): Unit
+
+  /**
+   * Compiles source filters to a predicate.
+   */
+  def toPredicate(filters: Seq[sources.Filter]): BasePredicate = {
+val reducedExpr = filters
+  .sortBy(_.references.size)
 
 Review comment:
   Why is `length` better than `size`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27381: [MINOR][SQL] Improve readability for some code comments

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27381: [MINOR][SQL] Improve 
readability for some code comments
URL: https://github.com/apache/spark/pull/27381#issuecomment-580999519
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27381: [MINOR][SQL] Improve readability for some code comments

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27381: [MINOR][SQL] Improve 
readability for some code comments
URL: https://github.com/apache/spark/pull/27381#issuecomment-580999521
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22472/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27381: [MINOR][SQL] Improve readability for some code comments

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27381: [MINOR][SQL] Improve readability for 
some code comments
URL: https://github.com/apache/spark/pull/27381#issuecomment-580999519
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27381: [MINOR][SQL] Improve readability for some code comments

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27381: [MINOR][SQL] Improve readability for 
some code comments
URL: https://github.com/apache/spark/pull/27381#issuecomment-580999521
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22472/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27381: [MINOR][SQL] Improve readability for some code comments

2020-01-31 Thread GitBox

SparkQA commented on issue #27381: [MINOR][SQL] Improve readability for some 
code comments
URL: https://github.com/apache/spark/pull/27381#issuecomment-580999428
 
 
   **[Test build #117712 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117712/testReport)**
 for PR 27381 at commit 
[`2e403bf`](https://github.com/apache/spark/commit/2e403bfa8edff38e961ffb6f4c9576fbe38d541d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-01-31 Thread GitBox

MaxGekk commented on a change in pull request #27366: [SPARK-30648][SQL] 
Support filters pushdown in JSON datasource
URL: https://github.com/apache/spark/pull/27366#discussion_r373761100
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/StructFilters.scala
 ##
 @@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst
+
+import scala.util.Try
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.types.{BooleanType, StructType}
+
+/**
+ * The class provides API for applying pushed down filters to partially or
+ * fully set internal rows that have the struct schema.
+ *
+ * @param filters The pushed down source filters. The filters should refer to
+ *the fields of the provided schema.
+ * @param schema The required schema of records from datasource files.
+ */
+abstract class StructFilters(filters: Seq[sources.Filter], schema: StructType) 
{
+
+  assert(filters.forall(StructFilters.checkFilterRefs(_, schema)),
+"A pushed down filter refers to a non-existing schema field.")
+
+  /**
+   * Applies pushed down source filters to the given row assuming that
+   * value at `index` has been already set.
+   *
+   * @param row The row with fully or partially set values.
+   * @param index The index of already set value.
+   * @return true if currently processed row can be skipped otherwise false.
+   */
+  def skipRow(row: InternalRow, index: Int): Boolean
+
+  /**
+   * Resets states of pushed down filters. The method must be called before
+   * precessing any new row otherwise skipRow() may return wrong result.
+   */
+  def reset(): Unit
+
+  /**
+   * Compiles source filters to a predicate.
+   */
+  def toPredicate(filters: Seq[sources.Filter]): BasePredicate = {
+val reducedExpr = filters
+  .sortBy(_.references.size)
+  .flatMap(StructFilters.filterToExpression(_, toRef))
+  .reduce(And)
+Predicate.create(reducedExpr)
+  }
+
+  // Finds a filter attribute in the schema and converts it to a 
`BoundReference`
+  def toRef(attr: String): Option[BoundReference] = {
+schema.getFieldIndex(attr).map { index =>
+  val field = schema(index)
+  BoundReference(schema.fieldIndex(attr), field.dataType, field.nullable)
+}
+  }
+}
+
+object StructFilters {
+  private def checkFilterRefs(filter: sources.Filter, schema: StructType): 
Boolean = {
+val fieldNames = schema.fields.map(_.name).toSet
+filter.references.forall(fieldNames.contains(_))
+  }
+
+  /**
+   * Returns the filters currently supported by the datasource.
+   * @param filters The filters pushed down to the datasource.
+   * @param schema data schema of datasource files.
+   * @return a sub-set of `filters` that can be handled by the datasource.
+   */
+  def pushedFilters(filters: Array[sources.Filter], schema: StructType): 
Array[sources.Filter] = {
+filters.filter(checkFilterRefs(_, schema))
+  }
+
+  private def zip[A, B](a: Option[A], b: Option[B]): Option[(A, B)] = {
 
 Review comment:
   Semantically this function does what `zip` should do. The problem is `zip` 
for Option returns `Iterable[(A, B)]` instead of `Option[(A, B)]`. I cannot 
agree that the name could mislead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27427: [SPARK-30700][ML] NaiveBayesModel predict optimization

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27427: [SPARK-30700][ML] 
NaiveBayesModel predict optimization
URL: https://github.com/apache/spark/pull/27427#issuecomment-580997249
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22471/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27427: [SPARK-30700][ML] NaiveBayesModel predict optimization

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27427: [SPARK-30700][ML] 
NaiveBayesModel predict optimization
URL: https://github.com/apache/spark/pull/27427#issuecomment-580997246
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27427: [SPARK-30700][ML] NaiveBayesModel predict optimization

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27427: [SPARK-30700][ML] NaiveBayesModel 
predict optimization
URL: https://github.com/apache/spark/pull/27427#issuecomment-580997246
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27427: [SPARK-30700][ML] NaiveBayesModel predict optimization

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27427: [SPARK-30700][ML] NaiveBayesModel 
predict optimization
URL: https://github.com/apache/spark/pull/27427#issuecomment-580997249
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22471/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27427: [SPARK-30700][ML] NaiveBayesModel predict optimization

2020-01-31 Thread GitBox

SparkQA commented on issue #27427: [SPARK-30700][ML] NaiveBayesModel predict 
optimization
URL: https://github.com/apache/spark/pull/27427#issuecomment-580997146
 
 
   **[Test build #117711 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117711/testReport)**
 for PR 27427 at commit 
[`7bee0b0`](https://github.com/apache/spark/commit/7bee0b03f030f108f4db1b2b54daa7b4238e027e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng opened a new pull request #27427: [SPARK-30700][ML] NaiveBayesModel predict optimization

2020-01-31 Thread GitBox

zhengruifeng opened a new pull request #27427: [SPARK-30700][ML] 
NaiveBayesModel predict optimization
URL: https://github.com/apache/spark/pull/27427
 
 
   ### What changes were proposed in this pull request?
   var `negThetaSum` is always used together with `pi`, so we can add them at 
first
   
   
   ### Why are the changes needed?
   only need to add one var `piMinusThetaSum`, instead of `pi` and `negThetaSum`
   
   
   ### Does this PR introduce any user-facing change?
   No
   
   
   ### How was this patch tested?
   existing testsuites
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27423: [SPARK-30697][SQL] Handle database and namespace exceptions in catalog.isView

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27423: [SPARK-30697][SQL] Handle 
database and namespace exceptions in catalog.isView
URL: https://github.com/apache/spark/pull/27423#issuecomment-580996683
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27423: [SPARK-30697][SQL] Handle database and namespace exceptions in catalog.isView

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27423: [SPARK-30697][SQL] Handle 
database and namespace exceptions in catalog.isView
URL: https://github.com/apache/spark/pull/27423#issuecomment-580996684
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117698/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27423: [SPARK-30697][SQL] Handle database and namespace exceptions in catalog.isView

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27423: [SPARK-30697][SQL] Handle database and 
namespace exceptions in catalog.isView
URL: https://github.com/apache/spark/pull/27423#issuecomment-580996683
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27213: [SPARK-30516][SQL] statistic estimation of FileScan should take partitionFilters into account

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27213: [SPARK-30516][SQL] statistic 
estimation of FileScan should take partitionFilters into account
URL: https://github.com/apache/spark/pull/27213#issuecomment-580996544
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27213: [SPARK-30516][SQL] statistic estimation of FileScan should take partitionFilters into account

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27213: [SPARK-30516][SQL] statistic 
estimation of FileScan should take partitionFilters into account
URL: https://github.com/apache/spark/pull/27213#issuecomment-580996547
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117702/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27423: [SPARK-30697][SQL] Handle database and namespace exceptions in catalog.isView

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27423: [SPARK-30697][SQL] Handle database and 
namespace exceptions in catalog.isView
URL: https://github.com/apache/spark/pull/27423#issuecomment-580996684
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117698/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27213: [SPARK-30516][SQL] statistic estimation of FileScan should take partitionFilters into account

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27213: [SPARK-30516][SQL] statistic 
estimation of FileScan should take partitionFilters into account
URL: https://github.com/apache/spark/pull/27213#issuecomment-580996547
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117702/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27213: [SPARK-30516][SQL] statistic estimation of FileScan should take partitionFilters into account

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27213: [SPARK-30516][SQL] statistic 
estimation of FileScan should take partitionFilters into account
URL: https://github.com/apache/spark/pull/27213#issuecomment-580996544
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27423: [SPARK-30697][SQL] Handle database and namespace exceptions in catalog.isView

2020-01-31 Thread GitBox

SparkQA commented on issue #27423: [SPARK-30697][SQL] Handle database and 
namespace exceptions in catalog.isView
URL: https://github.com/apache/spark/pull/27423#issuecomment-580996481
 
 
   **[Test build #117698 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117698/testReport)**
 for PR 27423 at commit 
[`b9005ea`](https://github.com/apache/spark/commit/b9005ea9b6ad7c26b03b59cbaa932a2d59e14529).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27423: [SPARK-30697][SQL] Handle database and namespace exceptions in catalog.isView

2020-01-31 Thread GitBox

SparkQA removed a comment on issue #27423: [SPARK-30697][SQL] Handle database 
and namespace exceptions in catalog.isView
URL: https://github.com/apache/spark/pull/27423#issuecomment-580968521
 
 
   **[Test build #117698 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117698/testReport)**
 for PR 27423 at commit 
[`b9005ea`](https://github.com/apache/spark/commit/b9005ea9b6ad7c26b03b59cbaa932a2d59e14529).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27213: [SPARK-30516][SQL] statistic estimation of FileScan should take partitionFilters into account

2020-01-31 Thread GitBox

SparkQA removed a comment on issue #27213: [SPARK-30516][SQL] statistic 
estimation of FileScan should take partitionFilters into account
URL: https://github.com/apache/spark/pull/27213#issuecomment-580972034
 
 
   **[Test build #117702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117702/testReport)**
 for PR 27213 at commit 
[`e948650`](https://github.com/apache/spark/commit/e948650ba47fa5f24af2881b2b2278d3897a431f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27213: [SPARK-30516][SQL] statistic estimation of FileScan should take partitionFilters into account

2020-01-31 Thread GitBox

SparkQA commented on issue #27213: [SPARK-30516][SQL] statistic estimation of 
FileScan should take partitionFilters into account
URL: https://github.com/apache/spark/pull/27213#issuecomment-580996331
 
 
   **[Test build #117702 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117702/testReport)**
 for PR 27213 at commit 
[`e948650`](https://github.com/apache/spark/commit/e948650ba47fa5f24af2881b2b2278d3897a431f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27423: [SPARK-30697][SQL] Handle database and namespace exceptions in catalog.isView

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27423: [SPARK-30697][SQL] Handle 
database and namespace exceptions in catalog.isView
URL: https://github.com/apache/spark/pull/27423#issuecomment-580995842
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27423: [SPARK-30697][SQL] Handle database and namespace exceptions in catalog.isView

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27423: [SPARK-30697][SQL] Handle 
database and namespace exceptions in catalog.isView
URL: https://github.com/apache/spark/pull/27423#issuecomment-580995845
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117701/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27423: [SPARK-30697][SQL] Handle database and namespace exceptions in catalog.isView

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27423: [SPARK-30697][SQL] Handle database and 
namespace exceptions in catalog.isView
URL: https://github.com/apache/spark/pull/27423#issuecomment-580995845
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117701/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27423: [SPARK-30697][SQL] Handle database and namespace exceptions in catalog.isView

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27423: [SPARK-30697][SQL] Handle database and 
namespace exceptions in catalog.isView
URL: https://github.com/apache/spark/pull/27423#issuecomment-580995842
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27423: [SPARK-30697][SQL] Handle database and namespace exceptions in catalog.isView

2020-01-31 Thread GitBox

SparkQA removed a comment on issue #27423: [SPARK-30697][SQL] Handle database 
and namespace exceptions in catalog.isView
URL: https://github.com/apache/spark/pull/27423#issuecomment-580970883
 
 
   **[Test build #117701 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117701/testReport)**
 for PR 27423 at commit 
[`b6379c0`](https://github.com/apache/spark/commit/b6379c08e9f1060f1c94791667a749469a1d60bb).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #24938: [SPARK-27946][SQL] Hive DDL to 
Spark DDL conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#issuecomment-580995626
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #24938: [SPARK-27946][SQL] Hive DDL to 
Spark DDL conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#issuecomment-580995630
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117697/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark 
DDL conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#issuecomment-580995630
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117697/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark 
DDL conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#issuecomment-580995626
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27423: [SPARK-30697][SQL] Handle database and namespace exceptions in catalog.isView

2020-01-31 Thread GitBox

SparkQA commented on issue #27423: [SPARK-30697][SQL] Handle database and 
namespace exceptions in catalog.isView
URL: https://github.com/apache/spark/pull/27423#issuecomment-580995644
 
 
   **[Test build #117701 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117701/testReport)**
 for PR 27423 at commit 
[`b6379c0`](https://github.com/apache/spark/commit/b6379c08e9f1060f1c94791667a749469a1d60bb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread GitBox

SparkQA removed a comment on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark 
DDL conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#issuecomment-580965775
 
 
   **[Test build #117697 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117697/testReport)**
 for PR 24938 at commit 
[`27c76b3`](https://github.com/apache/spark/commit/27c76b3b5e9106f0fe7de1ccb2c8576064e625da).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread GitBox

SparkQA commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL 
conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#issuecomment-580995453
 
 
   **[Test build #117697 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117697/testReport)**
 for PR 24938 at commit 
[`27c76b3`](https://github.com/apache/spark/commit/27c76b3b5e9106f0fe7de1ccb2c8576064e625da).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

beliefer commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 
8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-580994373
 
 
   @dongjoon-hyun Thanks for your help！I got it


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27232: [SPARK-30525][SQL]HiveTableScanExec do not need to prune partitions again after pushing down to SessionCatalog for partition pruning

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27232: 
[SPARK-30525][SQL]HiveTableScanExec do not need to prune partitions again after 
pushing down to SessionCatalog for partition pruning
URL: https://github.com/apache/spark/pull/27232#issuecomment-580994289
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27232: [SPARK-30525][SQL]HiveTableScanExec do not need to prune partitions again after pushing down to SessionCatalog for partition pruning

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27232: [SPARK-30525][SQL]HiveTableScanExec do 
not need to prune partitions again after pushing down to SessionCatalog for 
partition pruning
URL: https://github.com/apache/spark/pull/27232#issuecomment-580994290
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117708/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27232: [SPARK-30525][SQL]HiveTableScanExec do not need to prune partitions again after pushing down to SessionCatalog for partition pruning

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #27232: [SPARK-30525][SQL]HiveTableScanExec do 
not need to prune partitions again after pushing down to SessionCatalog for 
partition pruning
URL: https://github.com/apache/spark/pull/27232#issuecomment-580994289
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27232: [SPARK-30525][SQL]HiveTableScanExec do not need to prune partitions again after pushing down to SessionCatalog for partition pruning

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27232: 
[SPARK-30525][SQL]HiveTableScanExec do not need to prune partitions again after 
pushing down to SessionCatalog for partition pruning
URL: https://github.com/apache/spark/pull/27232#issuecomment-580994290
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117708/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27232: [SPARK-30525][SQL]HiveTableScanExec do not need to prune partitions again after pushing down to SessionCatalog for partition pruning

2020-01-31 Thread GitBox

SparkQA removed a comment on issue #27232: [SPARK-30525][SQL]HiveTableScanExec 
do not need to prune partitions again after pushing down to SessionCatalog for 
partition pruning
URL: https://github.com/apache/spark/pull/27232#issuecomment-580980219
 
 
   **[Test build #117708 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117708/testReport)**
 for PR 27232 at commit 
[`057a594`](https://github.com/apache/spark/commit/057a59454df6404710cfad6723e9003a2dbfd82f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27232: [SPARK-30525][SQL]HiveTableScanExec do not need to prune partitions again after pushing down to SessionCatalog for partition pruning

2020-01-31 Thread GitBox

SparkQA commented on issue #27232: [SPARK-30525][SQL]HiveTableScanExec do not 
need to prune partitions again after pushing down to SessionCatalog for 
partition pruning
URL: https://github.com/apache/spark/pull/27232#issuecomment-580994098
 
 
   **[Test build #117708 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117708/testReport)**
 for PR 27232 at commit 
[`057a594`](https://github.com/apache/spark/commit/057a59454df6404710cfad6723e9003a2dbfd82f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #27417: backport [SPARK-27747][SPARK-27816][SPARK-28344]

2020-01-31 Thread GitBox

dongjoon-hyun commented on issue #27417: backport 
[SPARK-27747][SPARK-27816][SPARK-28344]
URL: https://github.com/apache/spark/pull/27417#issuecomment-580993973
 
 
   Thank you all for your opinion. Especially, thank you for making this PR, 
@cloud-fan . I'll remove `Target Version: 2.4.5` for now. If we need this in 
`branch-2.4`, `Target Version` will be `2.4.6`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #27417: backport [SPARK-27747][SPARK-27816][SPARK-28344]

2020-01-31 Thread GitBox

dongjoon-hyun closed pull request #27417: backport 
[SPARK-27747][SPARK-27816][SPARK-28344]
URL: https://github.com/apache/spark/pull/27417
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

dongjoon-hyun closed pull request #27426: [SPARK-30698][BUILD] Bumps checkstyle 
from 8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

dongjoon-hyun commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle 
from 8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-580992670
 
 
   It's nothing. Since you are one of the active contributor, I hope you can do 
more in the community.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #24938: [SPARK-27946][SQL] Hive DDL to 
Spark DDL conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#issuecomment-580992552
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #24938: [SPARK-27946][SQL] Hive DDL to 
Spark DDL conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#issuecomment-580992554
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117694/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

beliefer commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 
8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-580992527
 
 
   @dongjoon-hyun Thanks for your help！


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark 
DDL conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#issuecomment-580992552
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread GitBox

AmplabJenkins commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark 
DDL conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#issuecomment-580992554
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117694/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread GitBox

SparkQA removed a comment on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark 
DDL conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#issuecomment-580962096
 
 
   **[Test build #117694 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117694/testReport)**
 for PR 24938 at commit 
[`9882932`](https://github.com/apache/spark/commit/9882932bb275d287dfc3cea3d52bb7903e25e73f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-31 Thread GitBox

SparkQA commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL 
conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#issuecomment-580992435
 
 
   **[Test build #117694 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117694/testReport)**
 for PR 24938 at commit 
[`9882932`](https://github.com/apache/spark/commit/9882932bb275d287dfc3cea3d52bb7903e25e73f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

dongjoon-hyun edited a comment on issue #27426: [SPARK-30698][BUILD] Bumps 
checkstyle from 8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-580992304
 
 
   Usually, you can take a look at the release note and mention a few notable 
bug lists from the release note.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer edited a comment on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

beliefer edited a comment on issue #27426: [SPARK-30698][BUILD] Bumps 
checkstyle from 8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-580992189
 
 
   @dongjoon-hyun I'm sorry! What should write here?:)
   I write some description here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

dongjoon-hyun commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle 
from 8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-580992304
 
 
   Usually, you can take a look the release note and mention a few notable bug 
lists from the release note.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

beliefer commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 
8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-580992189
 
 
   @dongjoon-hyun I'm sorry! What should write here?:)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

dongjoon-hyun commented on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle 
from 8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-580991804
 
 
   Ur, @beliefer . If you say `No`, there is no way to merge this. :)
   ```
   ### Why are the changes needed?
   
   No
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27426: [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29.

2020-01-31 Thread GitBox

AmplabJenkins removed a comment on issue #27426: [SPARK-30698][BUILD] Bumps 
checkstyle from 8.25 to 8.29.
URL: https://github.com/apache/spark/pull/27426#issuecomment-580991498
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22470/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1304 matches

Mail list logo