[GitHub] spark issue #20088: [SPARK-22905][ML][MLLIB][CORE] Fix ChiSqSelectorModel sa...

2017-12-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20088 Currently I cannot construct a failed test for this issue, but the future PR (changing `RoundRobinPartitioning`) by @jiangxb1987 will trigger this bug

[GitHub] spark pull request #20088: [SPARK-22905][ML][MLLIB][CORE] Fix ChiSqSelectorM...

2017-12-26 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20088 [SPARK-22905][ML][MLLIB][CORE] Fix ChiSqSelectorModel save implementation ## What changes were proposed in this pull request? Currently, in `ChiSqSelectorModel`, save

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST] Make ML testsuite support...

2017-12-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r158692700 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20077: [SPARK-22899][ML][STREAM] Fix OneVsRestModel tran...

2017-12-25 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20077 [SPARK-22899][ML][STREAM] Fix OneVsRestModel transform on streaming data failed. ## What changes were proposed in this pull request? Fix OneVsRestModel transform on streaming data

[GitHub] spark issue #19994: [SPARK-22810][ML][PySpark] Expose Python API for LinearR...

2017-12-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19994 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19979: [SPARK-22644][ML][TEST][FOLLOW-UP] ML regression package...

2017-12-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19979 @MrBago @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19950 @cloud-fan Does it works like: If A and B are any class which is registered, then Type Tuple2[A, B] will be automatically registered for kyro

[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19950 And, these items added cannot cover the case in `MultilayerPeceptron`. Look at `FeedForwardTrainer.train`, the persisted stacked `trainData`, the format is `RDD[(Double, mllib.Vector

[GitHub] spark pull request #19950: [SPARK-22450][Core][MLLib][FollowUp] safely regis...

2017-12-19 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19950#discussion_r157922929 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -187,14 +187,18 @@ class KryoSerializer(conf: SparkConf

[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...

2017-12-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19156 Jenkins retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19988: [Spark-22795] [ML] Raise error when line search in First...

2017-12-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19988 I think we can discuss the following cases: - When gradient non-zero, line-search failed, will the model always be meaning-less ? - When gradient nearly zero, and line-search failed. I

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r157668450 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #19988: [Spark-22795] [ML] Raise error when line search in First...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19988 @srowen Wait... @jkbradley seems to have more thoughts about this: Question: When line search failed, does it mean the model is always meaning-less ? Maybe we need more discussion

[GitHub] spark issue #19350: [SPARK-22126][ML][WIP] Fix model-specific optimization s...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19350 Design changed. I will create new PR for this later. New design is here https://docs.google.com/document/d/1xw5M4sp1e0eQie75yIt-r6-GTuD5vpFf_I6v-AFBM3M/edit?usp=sharing

[GitHub] spark pull request #19350: [SPARK-22126][ML][WIP] Fix model-specific optimiz...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/19350 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19857: [SPARK-22667][ML][WIP] Fix model-specific optimization s...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19857 The design of this issue changed. @MrBago will take this over. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19857: [SPARK-22667][ML][WIP] Fix model-specific optimiz...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/19857 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @felixcheung Another failed testcase, spark.mlp in sparkR, it also use `RFormula` and it will also generate indeterministic result, see class `MultilayerPerceptronClassifierWrapper` line 78

[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19988#discussion_r157441600 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -244,9 +244,9 @@ class LinearSVC @Since("

[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...

2017-12-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r157393913 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -140,10 +140,10 @@ final class Bucketizer @Since("1.4.0"

[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...

2017-12-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19988#discussion_r157393049 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -244,9 +244,9 @@ class LinearSVC @Since("

[GitHub] spark pull request #19994: [SPARK-22810][ML][PySpark] Expose Python API for ...

2017-12-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19994#discussion_r157391859 --- Diff: python/pyspark/ml/regression.py --- @@ -155,6 +183,14 @@ def intercept(self): """ return self._call_

[GitHub] spark pull request #19994: [SPARK-22810][ML][PySpark] Expose Python API for ...

2017-12-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19994#discussion_r157391801 --- Diff: python/pyspark/ml/tests.py --- @@ -1725,6 +1725,27 @@ def test_offset(self): self.assertTrue(np.isclose(model.intercept

[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...

2017-12-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19156 Jenkins retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #19621: [SPARK-11215][ML] Add multiple columns support to...

2017-12-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19621#discussion_r157169491 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -79,20 +80,49 @@ private[feature] trait StringIndexerBase

[GitHub] spark pull request #19350: [SPARK-22126][ML][WIP] Fix model-specific optimiz...

2017-12-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19350#discussion_r157118663 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala --- @@ -82,5 +86,49 @@ abstract class Estimator[M <: Model[M]] exte

[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19904 I discussed with @MrBago offline, I make a summary for what I thought now: I give 3 approaches which we can compare, after discussion I realized none of them is ideal, we have to make

[GitHub] spark pull request #19979: [SPARK-22644][ML][TEST][FOLLOW-UP] ML regression ...

2017-12-14 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19979 [SPARK-22644][ML][TEST][FOLLOW-UP] ML regression testsuite add StructuredStreaming test ## What changes were proposed in this pull request? ML regression testsuite add

[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19904 @MrBago Your code https://github.com/apache/spark/pull/19904#discussion_r156751569 also works fine, I think. Although it is more complicated. @BryanCutler >the unpers

[GitHub] spark pull request #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interfa...

2017-12-13 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19156#discussion_r156629043 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -197,14 +240,14 @@ private[ml] object SummaryBuilderImpl extends

[GitHub] spark pull request #19350: [SPARK-22126][ML][WIP] Fix model-specific optimiz...

2017-12-13 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19350#discussion_r156603289 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala --- @@ -82,5 +86,49 @@ abstract class Estimator[M <: Model[M]] exte

[GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator memory ...

2017-12-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19904#discussion_r156550777 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -146,25 +147,18 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator memory ...

2017-12-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19904#discussion_r156375242 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -146,25 +147,18 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark pull request #18390: [SPARK-21178][ML] Add support for label specific ...

2017-12-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18390#discussion_r156292467 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -38,17 +38,39 @@ class

[GitHub] spark pull request #18390: [SPARK-21178][ML] Add support for label specific ...

2017-12-12 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18390#discussion_r156295173 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala --- @@ -80,17 +102,42 @@ class

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @felixcheung "iris" is a built-in dataset in R, used in many algo testing, so is it proper

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156268308 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156257397 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19927: [SPARK-22737][ML] OVR transform optimization

2017-12-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19927#discussion_r156249173 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -156,54 +153,22 @@ final class OneVsRestModel private[ml

[GitHub] spark pull request #19715: [SPARK-22397][ML]add multiple columns support to ...

2017-12-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19715#discussion_r156010806 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -129,34 +156,102 @@ final class QuantileDiscretizer @Since

[GitHub] spark pull request #19927: [SPARK-22737][ML] OVR transform optimization

2017-12-10 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19927#discussion_r155996272 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -156,54 +153,22 @@ final class OneVsRestModel private[ml

[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19904 @sethah To verify the memory issue, you can add one line test code against current master at here: ``` val modelFutures = ... // Unpersist training data only when

[GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator memory ...

2017-12-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19904#discussion_r155715665 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -146,25 +147,18 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator memory ...

2017-12-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19904#discussion_r155695280 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -146,31 +146,34 @@ class CrossValidator @Since("1.2.0"

[GitHub] spark issue #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite support S...

2017-12-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19843 add UT for MLTest and change to use PipelineModel. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...

2017-12-06 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r155403743 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @felixcheung Yes, the spark.mlp test result changed because of indexer order changed. That's because, StringIndexer when item frequency equal, there's no definite rule for index order

[GitHub] spark issue #19889: [SPARK-22690][ML] Imputer inherit HasOutputCols

2017-12-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19889 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r155141609 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -184,4 +184,54 @@ class LibSVMRelationSuite extends

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r155141551 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -184,4 +184,54 @@ class LibSVMRelationSuite extends

[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator fitting memory...

2017-12-05 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19904 @BryanCutler @MLnick @MrBago @hhbyyh --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator fitting...

2017-12-05 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19904 [SPARK-22707][ML] Optimize CrossValidator fitting memory occupation by models ## What changes were proposed in this pull request? Via some test I found CrossValidator still exists

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-05 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @felixcheung There is no breaking change. But, we meet some trouble thing about indeterministic behavior. When frequency equal, the indexer result is indeterministic. I already fix those

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r154820556 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r154594735 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -184,4 +184,54 @@ class LibSVMRelationSuite extends

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r154594381 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -184,4 +184,54 @@ class LibSVMRelationSuite extends

[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-12-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r154598540 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala --- @@ -184,4 +184,54 @@ class LibSVMRelationSuite extends

[GitHub] spark issue #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSp...

2017-12-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19627 @jkbradley I think it is better to review #19857 (fix python model specific optimization) and merge it first and then I rebase & update thi

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 Any one can provide some suggestion ? for fixing sparkR glm test failure here. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #19758: [SPARK-3162][MLlib] Local Tree Training Pt 1: Ref...

2017-12-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19758#discussion_r154302624 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeSplitUtilsSuite.scala --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19857: [SPARK-22667][ML] Fix model-specific optimization suppor...

2017-11-30 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19857 @MrBago @jkbradley I think this PR need to be reviewed and merged first, before reviewing #19627 Because this PR change some critical code path

[GitHub] spark pull request #19857: [SPARK-22667][ML] Fix model-specific optimization...

2017-11-30 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19857 [SPARK-22667][ML] Fix model-specific optimization support for ML tuning: Python API ## What changes were proposed in this pull request? Python CrossValidator/TrainValidationSplit

[GitHub] spark issue #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite support S...

2017-11-29 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19843 @MrBago Thanks! I update code, now new action class `CheckAnswerRowsByFunc` is added. I do not add common trait as both of them are simple and I don't want to break old code

[GitHub] spark issue #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite support S...

2017-11-29 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19843 @MrBago @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite support S...

2017-11-29 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19843 Jenkins retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...

2017-11-28 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19843 [SPARK-22644][ML][TEST][WIP] Make ML testsuite support StructuredStreaming test ## What changes were proposed in this pull request? We need to add some helper code to make testing ML

[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-11-28 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19746 What about supporting multiple columns ? VectorAssembler require multiple input columns, they all need VectorSizeHint to transform first. There's no need to use multiple VectorSizeHint

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-11-28 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r153420330 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-11-28 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r153420429 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-24 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 I checked the failed tests in sparkR. There's some trouble in the failed `glm` sparkR tests. These tests compare sparkR glm and R-lib glm results on test data "iris", b

[GitHub] spark pull request #19758: [SPARK-3162][MLlib] Local Tree Training Pt 1: Ref...

2017-11-23 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19758#discussion_r152749515 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeSplitUtilsSuite.scala --- @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-23 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @viirya @MLnick Code updated. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @MLnick Ah, I don't express it exactly, the first case, what I mean is, sort by frequency, but if the case frequency equal, sort by alphabet

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @MLnick How about this way: The case "fequencyAsc/Desc", sort first by frequency and then by alphabet, The case "alphabetAsc/Desc", sort by alphabet (and if

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @MLnick Will RDD "count by value" aggregation be deterministic ? e.g., 2 RDD with the same elements, but with different element order and different partition number, will `rdd.co

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 Jenkins retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r152454669 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml

[GitHub] spark issue #19753: [SPARK-22521][ML] VectorIndexerModel support handle unse...

2017-11-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19753 Thanks @holdenk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2017-11-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19627 @holdenk Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19621: [SPARK-11215][ML] Add multiple columns support to...

2017-11-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19621#discussion_r152252491 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -130,21 +160,49 @@ class StringIndexer @Since("

[GitHub] spark pull request #19621: [SPARK-11215][ML] Add multiple columns support to...

2017-11-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19621#discussion_r152252553 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -217,69 +289,94 @@ class StringIndexerModel ( @Since

[GitHub] spark pull request #19621: [SPARK-11215][ML] Add multiple columns support to...

2017-11-21 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19621#discussion_r152252129 --- Diff: project/MimaExcludes.scala --- @@ -82,7 +82,15 @@ object MimaExcludes { // [SPARK-21087] CrossValidator, TrainValidationSplit

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-11-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r152166365 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-11-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r152158938 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-11-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r151958073 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-11-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r151956715 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-11-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r151956112 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-11-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r151954037 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-11-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r151955287 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-11-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r151954590 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2017-11-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19627 @holdenk Find the reason. There is an empty file in the directory. :) --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSp...

2017-11-16 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19627 My local test passed. This test failure looks like test system issue. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19753: [SPARK-22521][ML] VectorIndexerModel support hand...

2017-11-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19753#discussion_r151311832 --- Diff: python/pyspark/ml/feature.py --- @@ -2565,22 +2575,28 @@ class VectorIndexer(JavaEstimator, HasInputCol, HasOutputCol, JavaMLReadable, Ja

[GitHub] spark pull request #19753: [SPARK-22521][ML] VectorIndexerModel support hand...

2017-11-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19753#discussion_r151311569 --- Diff: python/pyspark/ml/feature.py --- @@ -2565,22 +2575,28 @@ class VectorIndexer(JavaEstimator, HasInputCol, HasOutputCol, JavaMLReadable, Ja

[GitHub] spark issue #19753: [SPARK-22521][ML] VectorIndexerModel support handle unse...

2017-11-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19753 @smurching The getter/setter is included in the super class `HasHandleInvalid`. I can add test

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 I want to ask, for option `StringIndexer.frequencyDesc`, in the case existing two labels which have the same frequency, which of them will be put in the front ? If this is not specified

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r151055221 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +346,39 @@ class VectorIndexerModel private[ml

[GitHub] spark pull request #19753: [SPARK-22521][ML] VectorIndexerModel support hand...

2017-11-14 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19753 [SPARK-22521][ML] VectorIndexerModel support handle unseen categories via handleInvalid: Python API ## What changes were proposed in this pull request? Add python api

[GitHub] spark issue #19588: [SPARK-12375][ML] VectorIndexerModel support handle unse...

2017-11-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19588 Python API jira created here: https://issues.apache.org/jira/browse/SPARK-22521 --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @viirya @MLnick Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

<    1   2   3   4   5   6   7   8   9   10   >