[GitHub] spark issue #19979: [SPARK-22881][ML][TEST] ML regression package testsuite ...

2017-12-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19979 Actually, going further than what Bago said: All of the places which use globalCheckFunction assume that Dataset.collect() returns the Rows in a fixed order. We should really fix those unit

spark git commit: [SPARK-22899][ML][STREAMING] Fix OneVsRestModel transform on streaming data failed.

2017-12-27 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 774715d5c -> 753793bc8 [SPARK-22899][ML][STREAMING] Fix OneVsRestModel transform on streaming data failed. ## What changes were proposed in this pull request? Fix OneVsRestModel transform on streaming data failed. ## How was this patch

[GitHub] spark issue #20077: [SPARK-22899][ML][Streaming] Fix OneVsRestModel transfor...

2017-12-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20077 LGTM Merging with master Let's just add a test as part of https://issues.apache.org/jira/browse/SPARK-22882

[GitHub] spark issue #20088: [SPARK-22905][ML][MLLIB][CORE] Fix ChiSqSelectorModel sa...

2017-12-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20088 This is hopefully a rare case of an error in assuming an ordering of rows; in general, we shouldn't (and don't, AFAIK) assume an order for rows in saving/loading models. LGTM, but before

[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

2017-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r158882354 --- Diff: python/pyspark/ml/tuning.py --- @@ -31,6 +31,17 @@ 'TrainValidationSplitModel'] +def parallelFitTasks(est, train

[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

2017-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r158881392 --- Diff: python/pyspark/ml/base.py --- @@ -18,13 +18,40 @@ from abc import ABCMeta, abstractmethod import copy +import threading

[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

2017-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r158882154 --- Diff: python/pyspark/ml/base.py --- @@ -47,6 +74,24 @@ def _fit(self, dataset): """ raise NotI

[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

2017-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r158881199 --- Diff: python/pyspark/ml/base.py --- @@ -18,13 +18,40 @@ from abc import ABCMeta, abstractmethod import copy +import threading

[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

2017-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r158880831 --- Diff: python/pyspark/ml/tests.py --- @@ -2359,6 +2359,21 @@ def test_unary_transformer_transform(self): self.assertEqual(res.input

[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

2017-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r158882626 --- Diff: python/pyspark/ml/base.py --- @@ -47,6 +74,24 @@ def _fit(self, dataset): """ raise NotI

[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

2017-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r158881984 --- Diff: python/pyspark/ml/base.py --- @@ -47,6 +74,24 @@ def _fit(self, dataset): """ raise NotI

[GitHub] spark issue #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the fit-mu...

2017-12-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20058 reviewing now --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158861023 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,519 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158879746 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,519 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158879381 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,519 @@ +/* + * Licensed to the Apache

spark git commit: [SPARK-22707][ML] Optimize CrossValidator memory occupation by models in fitting

2017-12-24 Thread jkbradley
PR to fix it. ## Discussion I give 3 approaches which we can compare, after discussion I realized none of them is ideal, we have to make a trade-off. **After discussion with jkbradley , choose approach 3** ### Approach 1 ~~The approach proposed by MrBago at~~ https://github.com/apache/spark/p

[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-24 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19904 LGTM Sorry for the delay & thanks for the PR! Merging with master --- - To unsubscribe, e-mail: reviews-unsu

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158618991 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,479 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158610281 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,479 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158594885 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,479 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158595342 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,456 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158578138 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,479 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158578038 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,479 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158578163 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,479 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158594929 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,479 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158595329 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,479 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158595136 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,479 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158595526 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,479 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158595902 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,479 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158595153 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,479 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r158578024 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,479 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19979: [SPARK-22644][ML][TEST][FOLLOW-UP] ML regression package...

2017-12-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19979 Thanks! Let's track these tasks in new JIRAs. I made one for regression just now: https://issues.apache.org/jira/browse/SPARK-22881

[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-12-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19746 LGTM Merged to master Thanks @MrBago and @WeichenXu123 ! --- - To unsubscribe, e-mail: reviews-unsubscr

spark git commit: [SPARK-22346][ML] VectorSizeHint Transformer for using VectorAssembler in StructuredSteaming

2017-12-22 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 13190a4f6 -> d23dc5b8e [SPARK-22346][ML] VectorSizeHint Transformer for using VectorAssembler in StructuredSteaming ## What changes were proposed in this pull request? A new VectorSizeHint transformer was added. This transformer is meant

[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-18 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19904 Strong +1 for unpersisting the data at the end. In the long-term, I don't think we'll even cache the training and validation datasets. Our caching of the training & validation data

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156794633 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156795030 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156797156 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,190 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156795068 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156793824 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156793561 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

spark git commit: [SPARK-22644][ML][TEST] Make ML testsuite support StructuredStreaming test

2017-12-12 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master c7d014861 -> 0e36ba621 [SPARK-22644][ML][TEST] Make ML testsuite support StructuredStreaming test ## What changes were proposed in this pull request? We need to add some helper code to make testing ML transformers & models easier with

[GitHub] spark issue #19843: [SPARK-22644][ML][TEST] Make ML testsuite support Struct...

2017-12-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19843 Merging with master Thanks @WeichenXu123 and @MrBago ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19843: [SPARK-22644][ML][TEST] Make ML testsuite support Struct...

2017-12-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19843 That failure was caused by a bad change elsewhere which has been reverted. Testing again... --- - To unsubscribe, e-mail

[GitHub] spark issue #19843: [SPARK-22644][ML][TEST] Make ML testsuite support Struct...

2017-12-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19843 LGTM Will merge after fresh tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156186135 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156151486 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite support S...

2017-12-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19843 LGTM, but I'll wait for the PR title & description updates to merge this. Thanks! --- - To unsubscribe, e-mail: rev

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r155896038 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite support S...

2017-12-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19843 Also, can you please remove "WIP" from the PR title and update the Testing part of the PR description? --- - To u

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155863529 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155863630 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155862850 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155863377 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155852882 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155863960 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155860707 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155862447 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155854403 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155853428 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155862041 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155850183 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155853763 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155856287 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155862816 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155687965 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155852524 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155861030 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155852062 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155850236 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155687996 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155849923 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r155851617 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-12-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19746 reviewing now --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...

2017-12-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19527 > For example with 5 categories, we don't know [0.0, 0.0, 0.0, 0.0, 0.0] means last category or invalid value. For the semantics I described ("OPTION 1"), I think it is clea

[GitHub] spark issue #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite support S...

2017-12-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19843 I'll make a call: Given that the SQL tests do not use clearActive, let's not bother with it. If we see flakiness, then we can try adding clearActive as a fix

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...

2017-12-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r155629428 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...

2017-12-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r155629359 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite support S...

2017-12-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19843 This looks awesome; I just had a couple of comments. Btw, this is fancy test code. It might be nice to add a little unit test to MLTest.scala to make sure that testTransformer does indeed fail

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...

2017-12-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r155403748 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala --- @@ -162,6 +168,12 @@ trait StreamTest extends QueryTest

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...

2017-12-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r155403356 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...

2017-12-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r155402705 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-12-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19746 @WeichenXu123 From what I've seen, it's more common for people to use VectorAssembler to assemble a bunch of Numeric columns, rather than a bunch of Vector columns. I'd recommend we do things

[GitHub] spark issue #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator for OneH...

2017-12-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19527 Question about this PR description comment: > Note that keep can't be used at the same time with dropLast as true. Because they will conflict in encoded vector by producing a vector of ze

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-12-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r154452715 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala --- @@ -41,8 +41,12 @@ import org.apache.spark.sql.types.{DoubleType

[GitHub] spark issue #19627: [SPARK-21088][ML][WIP] CrossValidator, TrainValidationSp...

2017-12-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19627 Is this still WIP or ready? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

spark git commit: [SPARK-21866][ML][PYSPARK] Adding spark image reader

2017-11-22 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 0605ad761 -> 1edb3175d [SPARK-21866][ML][PYSPARK] Adding spark image reader ## What changes were proposed in this pull request? Adding spark image reader, an implementation of schema for representing images in spark DataFrames The code

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19439 Merging with master This is awesome to get in---thanks a lot @imatiach-msft and everyone who contributed and reviewed

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19439 Thanks! LGTM pending tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19439 I just noticed: Where is data/mllib/images/kittens/DP153539.jpg from? (It's missing in the license list

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2017-11-20 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19381 I'll try to take a look but am pretty swamped currently. CC @yanboliang @MLnick @dbtsai @holdenk might you have time

[GitHub] spark issue #19753: [SPARK-22521][ML] VectorIndexerModel support handle unse...

2017-11-20 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19753 I'll try to take a look but am pretty swamped currently. CC @yanboliang @MLnick @dbtsai @holdenk might you have time

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-20 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19439 LGTM, except that it looks like this doesn't merge cleanly. Would you mind rebasing it on master? --- - To unsubscribe, e

spark git commit: [SPARK-12375][ML] VectorIndexerModel support handle unseen categories via handleInvalid

2017-11-14 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 774398045 -> 1e6f76059 [SPARK-12375][ML] VectorIndexerModel support handle unseen categories via handleInvalid ## What changes were proposed in this pull request? Support skip/error/keep strategy, similar to `StringIndexer`. Implemented

[GitHub] spark issue #19588: [SPARK-12375][ML] VectorIndexerModel support handle unse...

2017-11-14 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19588 @WeichenXu123 when you create the JIRA for Python, can you please link it to this task's JIRA? Thanks! LGTM Merging with master

spark git commit: [SPARK-21087][ML] CrossValidator, TrainValidationSplit expose sub models after fitting: Scala

2017-11-14 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master b00972259 -> 774398045 [SPARK-21087][ML] CrossValidator, TrainValidationSplit expose sub models after fitting: Scala ## What changes were proposed in this pull request? We add a parameter whether to collect the full model list when

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-11-14 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19208 Awesome, thanks for the updates and for checking backwards compatibility! LGTM Merging with master

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r150716852 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -37,7 +38,26 @@ import org.apache.spark.sql.types.{StructField

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r150718504 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorIndexerSuite.scala --- @@ -219,6 +231,33 @@ class VectorIndexerSuite extends

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r150717860 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +346,39 @@ class VectorIndexerModel private[ml

<    1   2   3   4   5   6   7   8   9   10   >