Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19979
Actually, going further than what Bago said: All of the places which use
globalCheckFunction assume that Dataset.collect() returns the Rows in a fixed
order. We should really fix those unit
Repository: spark
Updated Branches:
refs/heads/master 774715d5c -> 753793bc8
[SPARK-22899][ML][STREAMING] Fix OneVsRestModel transform on streaming data
failed.
## What changes were proposed in this pull request?
Fix OneVsRestModel transform on streaming data failed.
## How was this patch
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/20077
LGTM
Merging with master
Let's just add a test as part of
https://issues.apache.org/jira/browse/SPARK-22882
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/20088
This is hopefully a rare case of an error in assuming an ordering of rows;
in general, we shouldn't (and don't, AFAIK) assume an order for rows in
saving/loading models.
LGTM, but before
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r158882354
--- Diff: python/pyspark/ml/tuning.py ---
@@ -31,6 +31,17 @@
'TrainValidationSplitModel']
+def parallelFitTasks(est, train
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r158881392
--- Diff: python/pyspark/ml/base.py ---
@@ -18,13 +18,40 @@
from abc import ABCMeta, abstractmethod
import copy
+import threading
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r158882154
--- Diff: python/pyspark/ml/base.py ---
@@ -47,6 +74,24 @@ def _fit(self, dataset):
"""
raise NotI
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r158881199
--- Diff: python/pyspark/ml/base.py ---
@@ -18,13 +18,40 @@
from abc import ABCMeta, abstractmethod
import copy
+import threading
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r158880831
--- Diff: python/pyspark/ml/tests.py ---
@@ -2359,6 +2359,21 @@ def test_unary_transformer_transform(self):
self.assertEqual(res.input
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r158882626
--- Diff: python/pyspark/ml/base.py ---
@@ -47,6 +74,24 @@ def _fit(self, dataset):
"""
raise NotI
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r158881984
--- Diff: python/pyspark/ml/base.py ---
@@ -47,6 +74,24 @@ def _fit(self, dataset):
"""
raise NotI
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/20058
reviewing now
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158861023
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,519 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158879746
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,519 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158879381
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,519 @@
+/*
+ * Licensed to the Apache
PR to fix it.
## Discussion
I give 3 approaches which we can compare, after discussion I realized none of
them is ideal, we have to make a trade-off.
**After discussion with jkbradley , choose approach 3**
### Approach 1
~~The approach proposed by MrBago at~~
https://github.com/apache/spark/p
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19904
LGTM
Sorry for the delay & thanks for the PR!
Merging with master
---
-
To unsubscribe, e-mail: reviews-unsu
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158618991
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,479 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158610281
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,479 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158594885
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,479 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158595342
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,456 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158578138
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,479 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158578038
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,479 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158578163
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,479 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158594929
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,479 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158595329
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,479 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158595136
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,479 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158595526
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,479 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158595902
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,479 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158595153
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,479 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r158578024
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,479 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19979
Thanks! Let's track these tasks in new JIRAs. I made one for regression
just now: https://issues.apache.org/jira/browse/SPARK-22881
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19746
LGTM
Merged to master
Thanks @MrBago and @WeichenXu123 !
---
-
To unsubscribe, e-mail: reviews-unsubscr
Repository: spark
Updated Branches:
refs/heads/master 13190a4f6 -> d23dc5b8e
[SPARK-22346][ML] VectorSizeHint Transformer for using VectorAssembler in
StructuredSteaming
## What changes were proposed in this pull request?
A new VectorSizeHint transformer was added. This transformer is meant
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19904
Strong +1 for unpersisting the data at the end. In the long-term, I don't
think we'll even cache the training and validation datasets. Our caching of
the training & validation data
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r156794633
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r156795030
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r156797156
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,190 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r156795068
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r156793824
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r156793561
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Repository: spark
Updated Branches:
refs/heads/master c7d014861 -> 0e36ba621
[SPARK-22644][ML][TEST] Make ML testsuite support StructuredStreaming test
## What changes were proposed in this pull request?
We need to add some helper code to make testing ML transformers & models easier
with
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19843
Merging with master
Thanks @WeichenXu123 and @MrBago !
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19843
That failure was caused by a bad change elsewhere which has been reverted.
Testing again...
---
-
To unsubscribe, e-mail
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19843
LGTM
Will merge after fresh tests
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r156186135
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r156151486
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19843
LGTM, but I'll wait for the PR title & description updates to merge this.
Thanks!
---
-
To unsubscribe, e-mail: rev
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19843#discussion_r155896038
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19843
Also, can you please remove "WIP" from the PR title and update the Testing
part of the PR description?
---
-
To u
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155863529
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155863630
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155862850
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155863377
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155852882
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155863960
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155860707
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155862447
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155854403
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155853428
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155862041
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155850183
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155853763
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155856287
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155862816
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155687965
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155852524
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155861030
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155852062
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155850236
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155687996
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155849923
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r155851617
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19746
reviewing now
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19527
> For example with 5 categories, we don't know [0.0, 0.0, 0.0, 0.0, 0.0]
means last category or invalid value.
For the semantics I described ("OPTION 1"), I think it is clea
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19843
I'll make a call: Given that the SQL tests do not use clearActive, let's
not bother with it. If we see flakiness, then we can try adding clearActive as
a fix
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19843#discussion_r155629428
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19843#discussion_r155629359
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19843
This looks awesome; I just had a couple of comments. Btw, this is fancy
test code. It might be nice to add a little unit test to MLTest.scala to make
sure that testTransformer does indeed fail
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19843#discussion_r155403748
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala ---
@@ -162,6 +168,12 @@ trait StreamTest extends QueryTest
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19843#discussion_r155403356
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19843#discussion_r155402705
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19746
@WeichenXu123 From what I've seen, it's more common for people to use
VectorAssembler to assemble a bunch of Numeric columns, rather than a bunch of
Vector columns. I'd recommend we do things
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19527
Question about this PR description comment:
> Note that keep can't be used at the same time with dropLast as true.
Because they will conflict in encoded vector by producing a vector of ze
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r154452715
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala ---
@@ -41,8 +41,12 @@ import org.apache.spark.sql.types.{DoubleType
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19627
Is this still WIP or ready?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Repository: spark
Updated Branches:
refs/heads/master 0605ad761 -> 1edb3175d
[SPARK-21866][ML][PYSPARK] Adding spark image reader
## What changes were proposed in this pull request?
Adding spark image reader, an implementation of schema for representing images
in spark DataFrames
The code
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19439
Merging with master
This is awesome to get in---thanks a lot @imatiach-msft and everyone who
contributed and reviewed
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19439
Thanks! LGTM pending tests
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19439
I just noticed: Where is data/mllib/images/kittens/DP153539.jpg from?
(It's missing in the license list
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19381
I'll try to take a look but am pretty swamped currently. CC @yanboliang
@MLnick @dbtsai @holdenk might you have time
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19753
I'll try to take a look but am pretty swamped currently. CC @yanboliang
@MLnick @dbtsai @holdenk might you have time
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19439
LGTM, except that it looks like this doesn't merge cleanly. Would you mind
rebasing it on master?
---
-
To unsubscribe, e
Repository: spark
Updated Branches:
refs/heads/master 774398045 -> 1e6f76059
[SPARK-12375][ML] VectorIndexerModel support handle unseen categories via
handleInvalid
## What changes were proposed in this pull request?
Support skip/error/keep strategy, similar to `StringIndexer`.
Implemented
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19588
@WeichenXu123 when you create the JIRA for Python, can you please link it
to this task's JIRA? Thanks!
LGTM
Merging with master
Repository: spark
Updated Branches:
refs/heads/master b00972259 -> 774398045
[SPARK-21087][ML] CrossValidator, TrainValidationSplit expose sub models after
fitting: Scala
## What changes were proposed in this pull request?
We add a parameter whether to collect the full model list when
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19208
Awesome, thanks for the updates and for checking backwards compatibility!
LGTM
Merging with master
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r150716852
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -37,7 +38,26 @@ import org.apache.spark.sql.types.{StructField
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r150718504
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorIndexerSuite.scala ---
@@ -219,6 +231,33 @@ class VectorIndexerSuite extends
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r150717860
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +346,39 @@ class VectorIndexerModel private[ml
501 - 600 of 8390 matches
Mail list logo