[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-24 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r183643445 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala --- @@ -39,21 +46,28 @@ class MulticlassMetrics @Since

[GitHub] spark pull request #21129: [SPARK-7132][ML] Add fit with validation set to s...

2018-04-23 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21129 [SPARK-7132][ML] Add fit with validation set to spark.ml GBT ## What changes were proposed in this pull request? Add fit with validation set to spark.ml GBT ## How

[GitHub] spark issue #20446: [SPARK-23254][ML] Add user guide entry and example for D...

2018-04-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20446 @MLnick @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21097: [SPARK-14682][ML] Provide evaluateEachIteration m...

2018-04-19 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21097#discussion_r182668410 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -365,6 +365,20 @@ class GBTClassifierSuite extends

[GitHub] spark issue #21081: [SPARK-23975][ML]Allow Clustering to take Arrays of Doub...

2018-04-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21081 So why not design generic vector class ? and then implement Vector[Double] and Vector[Float] via generic specification ? So it can support everything, no matter sparse and dense

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21078 @jkbradley Updated. I would like to split `RandomForest` and `GradientBoostedTrees` modification into another PR because it will change many methods in them

[GitHub] spark issue #21081: [SPARK-23975][ML]Allow Clustering to take Arrays of Doub...

2018-04-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21081 @jkbradley Will this be applied to other algos besides clustering algos ? and how to support sparse float features

[GitHub] spark pull request #21097: [SPARK-14682][ML] Provide evaluateEachIteration m...

2018-04-18 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21097 [SPARK-14682][ML] Provide evaluateEachIteration method or equivalent for spark.ml GBTs ## What changes were proposed in this pull request? Provide evaluateEachIteration method

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-18 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r182367186 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala --- @@ -27,10 +27,11 @@ import

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r182004759 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala --- @@ -27,10 +27,11 @@ import

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r182002432 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -67,6 +68,10 @@ class

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r182003965 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -75,11 +80,16 @@ class

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-04-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19381 @dbtsai Good idea! Is there a related JIRA or could you open one for it ? cc @jkbradley --- - To unsubscribe, e-mail

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21078 @MrBago @jkbradley Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21078: [SPARK-23990][ML] Instruments logging improvement...

2018-04-16 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21078 [SPARK-23990][ML] Instruments logging improvements - ML regression package ## What changes were proposed in this pull request? Instruments logging improvements - ML regression package

[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21044#discussion_r181286908 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -195,15 +206,32 @@ final class OneVsRestModel private[ml

[GitHub] spark pull request #21044: [SPARK-9312][ML] Add RawPrediction, numClasses, a...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21044#discussion_r181287383 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -195,15 +206,32 @@ final class OneVsRestModel private[ml

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r181270142 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -81,32 +81,37 @@ object KolmogorovSmirnovTest

[GitHub] spark pull request #21051: [SPARK-23751][FOLLOW-UP] fix build for scala-2.12

2018-04-12 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21051 [SPARK-23751][FOLLOW-UP] fix build for scala-2.12 ## What changes were proposed in this pull request? fix build for scala-2.12 ## How was this patch tested? Manual

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r181018223 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -81,32 +81,37 @@ object KolmogorovSmirnovTest

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r181015525 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -81,32 +81,37 @@ object KolmogorovSmirnovTest

[GitHub] spark pull request #19381: [SPARK-10884][ML] Support prediction on single in...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19381#discussion_r181015190 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala --- @@ -192,12 +192,12 @@ abstract class ClassificationModel

[GitHub] spark pull request #17092: [SPARK-18450][ML] Scala API Change for LSH AND-am...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17092#discussion_r180999421 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala --- @@ -119,6 +118,9 @@ class MinHashLSH(override val uid: String) extends

[GitHub] spark pull request #17092: [SPARK-18450][ML] Scala API Change for LSH AND-am...

2018-04-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17092#discussion_r180998595 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSH.scala --- @@ -137,6 +136,9 @@ class

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2018-04-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15770 @wangmiao1981 If you're busy I can help take over this. -:) --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSp...

2018-04-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19627 @MrBago @yogeshg @jkbradley Updated and ready for review now! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValid...

2018-04-10 Thread WeichenXu123
GitHub user WeichenXu123 reopened a pull request: https://github.com/apache/spark/pull/19627 [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSplit support collect all models when fitting: Python API ## What changes were proposed in this pull request? CrossValidator

[GitHub] spark pull request #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValid...

2018-04-10 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/19627 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSp...

2018-04-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19627 Because of codebase changing, I will create new PR to replace this one. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20964: [SPARK-22883] ML test for StructuredStreaming: spark.ml....

2018-04-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20964 LGTM. 👍 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Pyth...

2018-04-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20904 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20235: [Spark-22887][ML][TESTS][WIP] ML test for Structu...

2018-04-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20235#discussion_r180027926 --- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala --- @@ -34,86 +35,122 @@ class FPGrowthSuite extends SparkFunSuite

[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-04-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20319 @smurakozi Thanks for the PR! Could you resolve conflicts first? and then I will make a review. If you're busy I can also take over

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20994: [SPARK-21898][ML][FOLLOWUP] Fix Scala 2.12 build.

2018-04-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20994 LGTM. Thanks! cc @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20982: [SPARK-23859][ML] Initial PR for Instrumentation ...

2018-04-05 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20982 [SPARK-23859][ML] Initial PR for Instrumentation improvements: UUID and logging levels ## What changes were proposed in this pull request? Initial PR for Instrumentation improvements

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179311446 --- Diff: python/pyspark/ml/stat.py --- @@ -134,6 +134,63 @@ def corr(dataset, column, method="pearson"): return _

[GitHub] spark issue #20837: [SPARK-23686][ML][WIP] Better instrumentation

2018-04-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20837 No problem. I will take over this. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20810: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/20810 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20810: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20810 According to @jkbradley 's opinion. I create a new PR which only use a static method. --- - To unsubscribe, e-mail

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-03 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20973 [SPARK-20114][ML] spark.ml parity for sequential pattern mining - PrefixSpan ## What changes were proposed in this pull request? PrefixSpan API for spark.ml. New implementation

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 @jkbradley Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178784053 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala --- @@ -48,8 +46,8 @@ class MinMaxScalerSuite extends SparkFunSuite

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178778285 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/ImputerSuite.scala --- @@ -76,6 +75,28 @@ class ImputerSuite extends SparkFunSuite

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178780101 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/MaxAbsScalerSuite.scala --- @@ -45,9 +44,9 @@ class MaxAbsScalerSuite extends SparkFunSuite

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178783980 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/MinHashLSHSuite.scala --- @@ -167,4 +166,20 @@ class MinHashLSHSuite extends SparkFunSuite

[GitHub] spark pull request #20964: [SPARK-22883] ML test for StructuredStreaming: sp...

2018-04-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20964#discussion_r178784391 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/NGramSuite.scala --- @@ -84,7 +84,7 @@ class NGramSuite extends MLTest

[GitHub] spark pull request #20313: [SPARK-22974][ML] Attach attributes to output col...

2018-04-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20313#discussion_r178517391 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -264,7 +265,9 @@ class CountVectorizerModel

[GitHub] spark issue #20934: [SPARK-23818][SQL][WIP] an official UDF interface for Sp...

2018-03-31 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20934 Will be open again when interface decision made for this. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20934: [SPARK-23818][SQL][WIP] an official UDF interface...

2018-03-31 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/20934 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20934: [SPARK-23818][SQL][WIP] an official UDF interface...

2018-03-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20934#discussion_r178446367 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/JavaUDF.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-30 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20934: [SPARK-23818][SQL][WIP] an official UDF interface...

2018-03-29 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20934#discussion_r178217425 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -217,6 +217,27 @@ class UDFRegistration private[sql

[GitHub] spark pull request #20934: [SPARK-23818][SQL][WIP] an official UDF interface...

2018-03-29 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20934 [SPARK-23818][SQL][WIP] an official UDF interface for Spark SQL ## What changes were proposed in this pull request? API: (to be discussed), use 2-args as example

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-03-26 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20904 [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Python API in pyspark.ml ## What changes were proposed in this pull request? Kolmogorov-Smirnoff test Python API in `pyspark.ml

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-22 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176631255 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -699,3 +699,88 @@ abstract class

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r176299569 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala --- @@ -175,6 +175,8 @@ private[sql] class HiveSessionCatalog

[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20795 Yea, I understand the reason to split built-in and external because you only want to cache external function name. But cache all used function names in a query do not cost too much so

[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20795 And I don't think it need to split into builtin and external function exist check in this case. Just following code works fine: ``` object LookupFunctions extends Rule[LogicalPlan

[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r176039540 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala --- @@ -175,6 +175,8 @@ private[sql] class HiveSessionCatalog

[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r176039913 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1076,6 +1076,16 @@ class SessionCatalog

[GitHub] spark pull request #20695: [SPARK-21741][ML][PySpark] Python API for DataFra...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20695#discussion_r176009765 --- Diff: python/pyspark/ml/stat.py --- @@ -132,6 +134,172 @@ def corr(dataset, column, method="pearson"): return _

[GitHub] spark issue #20786: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20786 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19381 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20786: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-03-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20786#discussion_r175970711 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala --- @@ -84,35 +86,73 @@ private[ml] object Node { /** * Create a new

[GitHub] spark pull request #20852: [SPARK-23728][BRANCH-2.3] Fix ML tests with expec...

2018-03-19 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20852#discussion_r175380424 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -119,9 +119,15 @@ trait MLTest extends StreamTest with TempDirectory

[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...

2018-03-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20806 @viirya ok. but there're already a class in ML use `TypedImperativeAggregator`, see `Summarizer`. And do you benchmark and compare this PR and `df.rdd.treeAggregate`? Seems they're

[GitHub] spark pull request #20837: [SPARK-23686][ML][WIP] Better instrumentation

2018-03-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20837#discussion_r174995768 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -55,6 +55,11 @@ private[spark] class Instrumentation[E

[GitHub] spark pull request #20837: [SPARK-23686][ML][WIP] Better instrumentation

2018-03-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20837#discussion_r174995625 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -55,6 +55,11 @@ private[spark] class Instrumentation[E

[GitHub] spark pull request #20837: [SPARK-23686][ML][WIP] Better instrumentation

2018-03-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20837#discussion_r174996170 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -517,6 +517,9 @@ class LogisticRegression @Since

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r174990264 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -49,32 +51,65 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r174993897 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -85,18 +120,34 @@ class VectorAssembler @Since("

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r174991898 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -49,32 +51,65 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r174990323 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -49,32 +51,65 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r174994214 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -49,32 +51,65 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r174990221 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -49,32 +51,65 @@ class VectorAssembler @Since("1.4.0"

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r174980520 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -234,7 +234,7 @@ class StringIndexerModel ( val

[GitHub] spark pull request #20829: [SPARK-23690][ML] Add handleinvalid to VectorAsse...

2018-03-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20829#discussion_r174675028 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -234,7 +234,7 @@ class StringIndexerModel ( val

[GitHub] spark issue #20539: [SPARK-22700][ML] Bucketizer.transform incorrectly drops...

2018-03-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20539 So why this PR still open ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...

2018-03-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20806 But I haven't benchmark. Maybe it do not worth to do codegen for treeAggregate. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...

2018-03-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20806 @viirya Yes. `treeAggregate` should only apply to global aggregate. But in this PR the API have to use `seqOp`/`combOp`. What I expect is that the dataframe version treeAggregate can

[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...

2018-03-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20806 The API seems not dataframe style. What I expect is something like: ``` dataset.groupBy().setAggregateLevel(2).agg(Map("age" -> "max&q

[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r174655509 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSlicerSuite.scala --- @@ -84,26 +84,29 @@ class VectorSlicerSuite extends

[GitHub] spark pull request #20810: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-03-13 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20810 [SPARK-20114][ML] spark.ml parity for sequential pattern mining - PrefixSpan ## What changes were proposed in this pull request? PrefixSpan API for spark.ml. Note: Currently

[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r173999125 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -58,14 +57,16 @@ class VectorAssemblerSuite

[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r173998034 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -299,18 +310,17 @@ class StringIndexerSuite

[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r173995919 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -299,18 +310,17 @@ class StringIndexerSuite

[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-03-12 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20695 gentle ping @MrBago @yogeshg Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20758: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-03-09 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/20758 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20758: [SPARK-14681][ML] Provide label/impurity stats for spark...

2018-03-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20758 According to @jkbradley 's suggestion, I create a new PR #20786 instead of this. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #20786: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-03-09 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20786 [SPARK-14681][ML] Provide label/impurity stats for spark.ml decision tree nodes ## What changes were proposed in this pull request? API: ``` trait ClassificationNode extends

[GitHub] spark pull request #20758: [SPARK-14681][ML] Provide label/impurity stats fo...

2018-03-07 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20758 [SPARK-14681][ML] Provide label/impurity stats for spark.ml decision tree nodes ## What changes were proposed in this pull request? Provide label/impurity stats for spark.ml decision

[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-03-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20633 LGTM. Thanks! @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19381 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r172415192 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/NormalizerSuite.scala --- @@ -17,94 +17,72 @@ package

[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r172408255 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala --- @@ -313,13 +306,14 @@ class RFormulaSuite extends MLTest

[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r172408009 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala --- @@ -324,19 +352,46 @@ class QuantileDiscretizerSuite

<    1   2   3   4   5   6   7   8   9   10   >