Re: [Apache Spark Jenkins] build system shutting down Dec 23th, 2021

2021-12-06 Thread Nick Pentreath
Wow! end of an era Thanks so much to you Shane for all you work over 10 (!!) years. And to Amplab also! Farewell Spark Jenkins! N On Tue, Dec 7, 2021 at 6:49 AM Nicholas Chammas wrote: > Farewell to Jenkins and its classic weather forecast build status icons: > > [image:

Re: Welcoming six new Apache Spark committers

2021-03-29 Thread Nick Pentreath
Congratulations to all the new committers. Welcome! On Fri, 26 Mar 2021 at 22:22, Matei Zaharia wrote: > Hi all, > > The Spark PMC recently voted to add several new committers. Please join me > in welcoming them to their new role! Our new committers are: > > - Maciej Szymkiewicz (contributor

Re: Welcoming some new Apache Spark committers

2020-07-14 Thread Nick Pentreath
Congratulations and welcome as Apache Spark committers! On Wed, 15 Jul 2020 at 06:59, Prashant Sharma wrote: > Congratulations all ! It's great to have such committed folks as > committers. :) > > On Wed, Jul 15, 2020 at 9:24 AM Yi Wu wrote: > >> Congrats!! >> >> On Wed, Jul 15, 2020 at 8:02

Re: [EXTERNAL] - Re: Problem with the ML ALS algorithm

2019-06-26 Thread Nick Pentreath
t I create using some > likelihood distributions of the rating values. I am only experimenting / > learning. In practice though, the list of items is likely to be at least > in the 10’s if not 100’s. Are even this item numbers to low? > > > > Thanks. > > > > -S >

Re: [EXTERNAL] - Re: Problem with the ML ALS algorithm

2019-06-26 Thread Nick Pentreath
; Number of items is 4 > > Ratings values are either 120, 20, 0 > > > > > > *From:* Nick Pentreath > *Sent:* Wednesday, June 26, 2019 6:03 AM > *To:* user@spark.apache.org > *Subject:* [EXTERNAL] - Re: Problem with the ML ALS algorithm > > > > This means that

Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-10-03 Thread Nick Pentreath
For ONNX you may be interested in https://github.com/onnx/onnxmltools - which supports conversion of a few skelarn models to ONNX already. However as far as I am aware, none of the ONNX backends actually support the ONNX-ML extended spec (in open-source at least). So you would not be able to

[jira] [Resolved] (SPARK-25412) FeatureHasher would change the value of output feature

2018-09-13 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-25412. Resolution: Not A Bug > FeatureHasher would change the value of output feat

[jira] [Commented] (SPARK-25412) FeatureHasher would change the value of output feature

2018-09-13 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613160#comment-16613160 ] Nick Pentreath commented on SPARK-25412: (1) is by design. Feature hashing does not store

[jira] [Commented] (SPARK-24467) VectorAssemblerEstimator

2018-06-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516861#comment-16516861 ] Nick Pentreath commented on SPARK-24467: One option is to do that same as we did for one hot

[jira] [Comment Edited] (SPARK-24467) VectorAssemblerEstimator

2018-06-08 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506334#comment-16506334 ] Nick Pentreath edited comment on SPARK-24467 at 6/8/18 5:59 PM: Yeah

[jira] [Commented] (SPARK-24467) VectorAssemblerEstimator

2018-06-08 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506334#comment-16506334 ] Nick Pentreath commented on SPARK-24467: Yeah the estimator would return a {{Model}} from {{fit

Re: Revisiting Online serving of Spark models?

2018-06-05 Thread Nick Pentreath
I will aim to join up at 4pm tomorrow (Wed) too. Look forward to it. On Sun, 3 Jun 2018 at 00:24 Holden Karau wrote: > On Sat, Jun 2, 2018 at 8:39 PM, Maximiliano Felice < > maximilianofel...@gmail.com> wrote: > >> Hi! >> >> We're already in San Francisco waiting for the summit. We even think

Re: How to use StringIndexer for multiple input /output columns in Spark Java

2018-05-15 Thread Nick Pentreath
Multi column support for StringIndexer didn’t make it into Spark 2.3.0 The PR is still in progress I think - should be available in 2.4.0 On Mon, 14 May 2018 at 22:32, Mina Aslani wrote: > Please take a look at the api doc: >

Re: A naive ML question

2018-04-29 Thread Nick Pentreath
One potential approach could be to construct a transition matrix showing the probability of moving from each state to another state. This can be visualized with a “heat map” encoding (I think matshow in numpy/matplotlib does this). On Sat, 28 Apr 2018 at 21:34, kant kodali

Re: StringIndexer with high cardinality huge data

2018-04-10 Thread Nick Pentreath
Also check out FeatureHasher in Spark 2.3.0 which is designed to handle this use case in a more natural way than HashingTF (and handles multiple columns at once). On Tue, 10 Apr 2018 at 16:00, Filipp Zhinkin wrote: > Hi Shahab, > > do you actually need to have a few

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-03 Thread Nick Pentreath
Congratulations! On Tue, 3 Apr 2018 at 05:34 wangzhenhua (G) wrote: > > > Thanks everyone! It’s my great pleasure to be part of such a professional > and innovative community! > > > > > > best regards, > > -Zhenhua(Xander) > > >

Re: Spark MLlib: Should I call .cache before fitting a model?

2018-02-27 Thread Nick Pentreath
Currently, fit for many (most I think) models will cache the input data. For LogisticRegression this is definitely the case, so you won't get any benefit from caching it yourself. On Tue, 27 Feb 2018 at 21:25 Gevorg Hari wrote: > Imagine that I am training a Spark MLlib

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-27 Thread Nick Pentreath
+1 (binding) Built and ran Scala tests with "-Phadoop-2.6 -Pyarn -Phive", all passed. Python tests passed (also including pyspark-streaming w/kafka-0.8 and flume packages built) On Tue, 27 Feb 2018 at 10:09 Felix Cheung wrote: > +1 > > Tested R: > > install from

[jira] [Commented] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2018-02-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366809#comment-16366809 ] Nick Pentreath commented on SPARK-23265: Thanks for the ping - yes it adds more detailed checking

[jira] [Updated] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2018-02-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-23265: --- Description: SPARK-22397 added support for multiple columns to {{QuantileDiscretizer

[jira] [Commented] (SPARK-23437) [ML] Distributed Gaussian Process Regression for MLlib

2018-02-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366744#comment-16366744 ] Nick Pentreath commented on SPARK-23437: It sounds interesting - however the standard practice

Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-14 Thread Nick Pentreath
-1 for me as we elevated https://issues.apache.org/jira/browse/SPARK-23377 to a Blocker. It should be fixed before release. On Thu, 15 Feb 2018 at 07:25 Holden Karau wrote: > If this is a blocker in your view then the vote thread is an important > place to mention it.

[jira] [Commented] (SPARK-23377) Bucketizer with multiple columns persistence bug

2018-02-13 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362182#comment-16362182 ] Nick Pentreath commented on SPARK-23377: Should this be a blocker for 2.3? I think so since

Re: redundant decision tree model

2018-02-13 Thread Nick Pentreath
There is a long outstanding JIRA issue about it: https://issues.apache.org/jira/browse/SPARK-3155. It is probably still a useful feature to have for trees but the priority is not that high since it may not be that useful for the tree ensemble models. On Tue, 13 Feb 2018 at 11:52 Alessandro

[jira] [Commented] (SPARK-14047) GBT improvement umbrella

2018-02-07 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355216#comment-16355216 ] Nick Pentreath commented on SPARK-14047: SPARK-12375 should fix that? Can you check it against

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Nick Pentreath
All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side that should be everything outstanding. On Thu, 1 Feb 2018 at 06:21 Yin Huai wrote: > seems we are not running tests related to pandas in pyspark tests (see my > email "python tests related to pandas

[jira] [Resolved] (SPARK-23105) Spark MLlib, GraphX 2.3 QA umbrella

2018-02-01 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-23105. Resolution: Resolved Fix Version/s: 2.3.0 > Spark MLlib, GraphX 2.3 QA umbre

[jira] [Resolved] (SPARK-23110) ML 2.3 QA: API: Java compatibility, docs

2018-02-01 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-23110. Resolution: Resolved Fix Version/s: 2.3.0 > ML 2.3 QA: API: Java compatibil

[jira] [Resolved] (SPARK-23107) ML, Graph 2.3 QA: API: New Scala APIs, docs

2018-02-01 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-23107. Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20459 [https

[jira] [Commented] (SPARK-23290) inadvertent change in handling of DateType when converting to pandas dataframe

2018-02-01 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348223#comment-16348223 ] Nick Pentreath commented on SPARK-23290: cc [~bryanc] > inadvertent change in handl

[jira] [Comment Edited] (SPARK-23110) ML 2.3 QA: API: Java compatibility, docs

2018-01-31 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346645#comment-16346645 ] Nick Pentreath edited comment on SPARK-23110 at 1/31/18 11:34 AM: -- Took

[jira] [Comment Edited] (SPARK-23110) ML 2.3 QA: API: Java compatibility, docs

2018-01-31 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346645#comment-16346645 ] Nick Pentreath edited comment on SPARK-23110 at 1/31/18 11:32 AM: -- Took

[jira] [Commented] (SPARK-23110) ML 2.3 QA: API: Java compatibility, docs

2018-01-31 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346645#comment-16346645 ] Nick Pentreath commented on SPARK-23110: Took a quick look through the diff.  I did pick up

[jira] [Commented] (SPARK-23110) ML 2.3 QA: API: Java compatibility, docs

2018-01-31 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346573#comment-16346573 ] Nick Pentreath commented on SPARK-23110: I checked added classes from {{added_ml_class}}, all

[jira] [Resolved] (SPARK-23111) ML, Graph 2.3 QA: Update user guide for new features & APIs

2018-01-31 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-23111. Resolution: Resolved Fix Version/s: 2.3.0 > ML, Graph 2.3 QA: Update user gu

[jira] [Commented] (SPARK-23111) ML, Graph 2.3 QA: Update user guide for new features & APIs

2018-01-31 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346442#comment-16346442 ] Nick Pentreath commented on SPARK-23111: Went through all the new features and listed the Jira

[jira] [Assigned] (SPARK-23111) ML, Graph 2.3 QA: Update user guide for new features & APIs

2018-01-31 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-23111: -- Assignee: Nick Pentreath > ML, Graph 2.3 QA: Update user guide for new featu

[jira] [Resolved] (SPARK-23112) ML, Graph 2.3 QA: Programming guide update and migration guide

2018-01-31 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-23112. Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20421 [https

[jira] [Commented] (SPARK-23154) Document backwards compatibility guarantees for ML persistence

2018-01-30 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344885#comment-16344885 ] Nick Pentreath commented on SPARK-23154: Where do we intend to put this note? In [http

[jira] [Updated] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-23265: --- Description: SPARK-22397 added support for multiple columns to {{QuantileDiscretizer

[jira] [Commented] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344604#comment-16344604 ] Nick Pentreath commented on SPARK-23265: cc [~huaxing]  > Update multi-column error handl

[jira] [Updated] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-23265: --- Issue Type: Improvement (was: Documentation) > Update multi-column error handling lo

[jira] [Created] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2018-01-29 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-23265: -- Summary: Update multi-column error handling logic in QuantileDiscretizer Key: SPARK-23265 URL: https://issues.apache.org/jira/browse/SPARK-23265 Project: Spark

[jira] [Updated] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-23265: --- Description: SPARK-22397 added support for multiple columns to {{QuantileDiscretizer

[jira] [Resolved] (SPARK-23138) Add user guide example for multiclass logistic regression summary

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-23138. Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20332 [https

[jira] [Assigned] (SPARK-23138) Add user guide example for multiclass logistic regression summary

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-23138: -- Assignee: Seth Hendrickson > Add user guide example for multiclass logis

Re: Reverse MinMaxScaler in SparkML

2018-01-29 Thread Nick Pentreath
This would be interesting and a good addition I think. It bears some thought about the API though. One approach is to have an "inverseTransform" method similar to sklearn. The other approach is to "formalize" something like StringIndexerModel -> IndexToString. Here, the inverse transformer is a

[jira] [Assigned] (SPARK-23108) ML, Graph 2.3 QA: API: Experimental, DeveloperApi, final, sealed audit

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-23108: -- Assignee: Nick Pentreath > ML, Graph 2.3 QA: API: Experimental, DeveloperApi, fi

[jira] [Comment Edited] (SPARK-23108) ML, Graph 2.3 QA: API: Experimental, DeveloperApi, final, sealed audit

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343278#comment-16343278 ] Nick Pentreath edited comment on SPARK-23108 at 1/29/18 12:14 PM: -- Went

[jira] [Resolved] (SPARK-23108) ML, Graph 2.3 QA: API: Experimental, DeveloperApi, final, sealed audit

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-23108. Resolution: Resolved Fix Version/s: 2.3.0 > ML, Graph 2.3 QA: API: Experimen

[jira] [Commented] (SPARK-23108) ML, Graph 2.3 QA: API: Experimental, DeveloperApi, final, sealed audit

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343290#comment-16343290 ] Nick Pentreath commented on SPARK-23108: Also checked ml {{DeveloperAPI}}, nothing to graduate

[jira] [Commented] (SPARK-23108) ML, Graph 2.3 QA: API: Experimental, DeveloperApi, final, sealed audit

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343278#comment-16343278 ] Nick Pentreath commented on SPARK-23108: I think at this late stage we should not open up

[jira] [Commented] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343276#comment-16343276 ] Nick Pentreath commented on SPARK-23109: Created SPARK-23256 to track {{columnSchema}} in Python

[jira] [Created] (SPARK-23256) Add columnSchema method to PySpark image reader

2018-01-29 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-23256: -- Summary: Add columnSchema method to PySpark image reader Key: SPARK-23256 URL: https://issues.apache.org/jira/browse/SPARK-23256 Project: Spark Issue

[jira] [Commented] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343269#comment-16343269 ] Nick Pentreath commented on SPARK-23109: So [~bryanc] I think this is done then? Can you confirm

[jira] [Commented] (SPARK-21866) SPIP: Image support in Spark

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343266#comment-16343266 ] Nick Pentreath commented on SPARK-21866: Ok, added SPARK-23255 to track user guide additions

[jira] [Created] (SPARK-23255) Add user guide and examples for DataFrame image reading functions

2018-01-29 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-23255: -- Summary: Add user guide and examples for DataFrame image reading functions Key: SPARK-23255 URL: https://issues.apache.org/jira/browse/SPARK-23255 Project: Spark

[jira] [Updated] (SPARK-23107) ML, Graph 2.3 QA: API: New Scala APIs, docs

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-23107: --- Description: Audit new public Scala APIs added to MLlib & GraphX. Take note of: * Prote

[jira] [Updated] (SPARK-23227) Add user guide entry for collecting sub models for cross-validation classes

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-23227: --- Priority: Minor (was: Major) > Add user guide entry for collecting sub models for cr

[jira] [Updated] (SPARK-23254) Add user guide entry for DataFrame multivariate summary

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-23254: --- Priority: Minor (was: Major) > Add user guide entry for DataFrame multivariate summ

[jira] [Updated] (SPARK-23127) Update FeatureHasher user guide for catCols parameter

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-23127: --- Priority: Minor (was: Major) > Update FeatureHasher user guide for catCols parame

[jira] [Created] (SPARK-23254) Add user guide entry for DataFrame multivariate summary

2018-01-29 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-23254: -- Summary: Add user guide entry for DataFrame multivariate summary Key: SPARK-23254 URL: https://issues.apache.org/jira/browse/SPARK-23254 Project: Spark

[jira] [Commented] (SPARK-17139) Add model summary for MultinomialLogisticRegression

2018-01-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343155#comment-16343155 ] Nick Pentreath commented on SPARK-17139: Ok added a PR to update migration guide for {{2.3

[jira] [Commented] (SPARK-21866) SPIP: Image support in Spark

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341040#comment-16341040 ] Nick Pentreath commented on SPARK-21866: [~hyukjin.kwon] [~imatiach] Was any doc or examples done

[jira] [Resolved] (SPARK-23113) Update MLlib, GraphX websites for 2.3

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-23113. Resolution: Resolved > Update MLlib, GraphX websites for

[jira] [Assigned] (SPARK-23113) Update MLlib, GraphX websites for 2.3

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-23113: -- Assignee: Nick Pentreath > Update MLlib, GraphX websites for

[jira] [Commented] (SPARK-23113) Update MLlib, GraphX websites for 2.3

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341030#comment-16341030 ] Nick Pentreath commented on SPARK-23113: No updates to MLlib project website required for {{2.3

[jira] [Commented] (SPARK-23107) ML, Graph 2.3 QA: API: New Scala APIs, docs

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341022#comment-16341022 ] Nick Pentreath commented on SPARK-23107: [~felixcheung] I added SPARK-23231 (and listed

[jira] [Created] (SPARK-23231) Add doc for string indexer ordering to user guide (also to RFormula guide)

2018-01-26 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-23231: -- Summary: Add doc for string indexer ordering to user guide (also to RFormula guide) Key: SPARK-23231 URL: https://issues.apache.org/jira/browse/SPARK-23231

[jira] [Commented] (SPARK-23110) ML 2.3 QA: API: Java compatibility, docs

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341009#comment-16341009 ] Nick Pentreath commented on SPARK-23110: [~WeichenXu123] any update? > ML 2.3 QA: API: J

***UNCHECKED*** [jira] [Updated] (SPARK-22797) Add multiple column support to PySpark Bucketizer

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-22797: --- Target Version/s: 2.3.0 (was: 2.4.0) > Add multiple column support to PySpark Bucketi

[jira] [Assigned] (SPARK-22797) Add multiple column support to PySpark Bucketizer

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-22797: -- Assignee: zhengruifeng > Add multiple column support to PySpark Bucketi

[jira] [Resolved] (SPARK-22797) Add multiple column support to PySpark Bucketizer

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-22797. Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19892 [https

[jira] [Resolved] (SPARK-22799) Bucketizer should throw exception if single- and multi-column params are both set

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-22799. Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19993 [https

[jira] [Assigned] (SPARK-22799) Bucketizer should throw exception if single- and multi-column params are both set

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-22799: -- Assignee: Marco Gaido > Bucketizer should throw exception if single- and multi-col

[jira] [Created] (SPARK-23227) Add user guide entry for collecting sub models for cross-validation classes

2018-01-26 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-23227: -- Summary: Add user guide entry for collecting sub models for cross-validation classes Key: SPARK-23227 URL: https://issues.apache.org/jira/browse/SPARK-23227

[jira] [Commented] (SPARK-23107) ML, Graph 2.3 QA: API: New Scala APIs, docs

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340786#comment-16340786 ] Nick Pentreath commented on SPARK-23107: [~felixcheung] have issues been created to track

[jira] [Updated] (SPARK-23107) ML, Graph 2.3 QA: API: New Scala APIs, docs

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-23107: --- Affects Version/s: 2.3.0 Target Version/s: 2.3.0 > ML, Graph 2.3 QA: API: New Sc

[jira] [Assigned] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-23109: -- Assignee: Bryan Cutler > ML 2.3 QA: API: Python API cover

[jira] [Commented] (SPARK-23107) ML, Graph 2.3 QA: API: New Scala APIs, docs

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340783#comment-16340783 ] Nick Pentreath commented on SPARK-23107: [~yanboliang] any update on this one? > ML, Graph

[jira] [Reopened] (SPARK-23112) ML, Graph 2.3 QA: Programming guide update and migration guide

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reopened SPARK-23112: Re-opening as breaking change in SPARK-17139 needs to be addressed > ML, Graph 2.3

[jira] [Updated] (SPARK-23112) ML, Graph 2.3 QA: Programming guide update and migration guide

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-23112: --- Affects Version/s: 2.3.0 Target Version/s: 2.3.0 Fix Version/s: (was: 2.3.0

[jira] [Commented] (SPARK-23106) ML, Graph 2.3 QA: API: Binary incompatible changes

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340779#comment-16340779 ] Nick Pentreath commented on SPARK-23106: Will keep this as resolved as it should be done now

[jira] [Assigned] (SPARK-23106) ML, Graph 2.3 QA: API: Binary incompatible changes

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-23106: -- Assignee: Bago Amirbekian > ML, Graph 2.3 QA: API: Binary incompatible chan

[jira] [Commented] (SPARK-23106) ML, Graph 2.3 QA: API: Binary incompatible changes

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340778#comment-16340778 ] Nick Pentreath commented on SPARK-23106: I've audited all the other ML-related MiMa exclusions

[jira] [Commented] (SPARK-23106) ML, Graph 2.3 QA: API: Binary incompatible changes

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340735#comment-16340735 ] Nick Pentreath commented on SPARK-23106: SPARK-17139 breaks binary compat, I've commented

[jira] [Commented] (SPARK-17139) Add model summary for MultinomialLogisticRegression

2018-01-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340728#comment-16340728 ] Nick Pentreath commented on SPARK-17139: So, in terms of binary compat, the change itself here

[jira] [Commented] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340653#comment-16340653 ] Nick Pentreath commented on SPARK-23109: [~bryanc] can you add a Jira for adding {{columnSchema

[jira] [Updated] (SPARK-22799) Bucketizer should throw exception if single- and multi-column params are both set

2018-01-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-22799: --- Target Version/s: 2.3.0 (was: 2.4.0) > Bucketizer should throw exception if sin

[jira] [Updated] (SPARK-23106) ML, Graph 2.3 QA: API: Binary incompatible changes

2018-01-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-23106: --- Affects Version/s: 2.3.0 Target Version/s: 2.3.0 > ML, Graph 2.3 QA: API: Bin

[jira] [Commented] (SPARK-23106) ML, Graph 2.3 QA: API: Binary incompatible changes

2018-01-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340645#comment-16340645 ] Nick Pentreath commented on SPARK-23106: Thanks [~bago.amirbekian]. However, running MiMa

[jira] [Updated] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-23109: --- Affects Version/s: 2.3.0 Target Version/s: 2.3.0 > ML 2.3 QA: API: Python API cover

[jira] [Assigned] (SPARK-23163) Sync Python ML API docs with Scala

2018-01-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-23163: -- Assignee: Bryan Cutler > Sync Python ML API docs with Sc

[jira] [Resolved] (SPARK-23163) Sync Python ML API docs with Scala

2018-01-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-23163. Resolution: Fixed Fix Version/s: 2.3.0 > Sync Python ML API docs with Sc

[jira] [Updated] (SPARK-22799) Bucketizer should throw exception if single- and multi-column params are both set

2018-01-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-22799: --- Priority: Blocker (was: Major) > Bucketizer should throw exception if single- and mu

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Nick Pentreath
I think this has come up before (and Sean mentions it above), but the sub-items on: SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella are actually marked as Blockers, but are not targeted to 2.3.0. I think they should be, and I'm not comfortable with those not being resolved before voting

[jira] [Assigned] (SPARK-23112) ML, Graph 2.3 QA: Programming guide update and migration guide

2018-01-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-23112: -- Assignee: Nick Pentreath > ML, Graph 2.3 QA: Programming guide update and migrat

[jira] [Resolved] (SPARK-23112) ML, Graph 2.3 QA: Programming guide update and migration guide

2018-01-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-23112. Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20363 [https

[jira] [Resolved] (SPARK-22735) Add VectorSizeHint to ML features documentation

2018-01-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-22735. Resolution: Fixed Fix Version/s: 2.3.0 > Add VectorSizeHint to ML featu

[jira] [Commented] (SPARK-23112) ML, Graph 2.3 QA: Programming guide update and migration guide

2018-01-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335821#comment-16335821 ] Nick Pentreath commented on SPARK-23112: {{OneHotEncoder}} is the only deprecation I can see

  1   2   3   4   5   6   7   8   9   10   >