[jira] [Commented] (SPARK-21624) Optimize communication cost of RF/GBT/DT

2017-08-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112485#comment-16112485 ] Nick Pentreath commented on SPARK-21624: cc [~sethah] > Optimize communication cost of RF/GBT

[jira] [Commented] (SPARK-21535) Reduce memory requirement for CrossValidator and TrainValidationSplit

2017-07-27 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103152#comment-16103152 ] Nick Pentreath commented on SPARK-21535: Isn't this in direct opposition to https

[jira] [Commented] (SPARK-10802) Let ALS recommend for subset of data

2017-07-27 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102971#comment-16102971 ] Nick Pentreath commented on SPARK-10802: For those that may be interested - I opened a PR to add

[jira] [Resolved] (SPARK-20988) Convert logistic regression to new aggregator framework

2017-07-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-20988. Resolution: Fixed Fix Version/s: 2.3.0 > Convert logistic regression to

[jira] [Assigned] (SPARK-20988) Convert logistic regression to new aggregator framework

2017-07-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-20988: -- Assignee: Seth Hendrickson > Convert logistic regression to new aggregator framew

[jira] [Commented] (SPARK-21535) Reduce memory requirement for CrossValidator and TrainValidationSplit

2017-07-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101452#comment-16101452 ] Nick Pentreath commented on SPARK-21535: Parallel CV is in progress: https://github.com/apache

[jira] [Commented] (SPARK-21483) Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class)

2017-07-20 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094525#comment-16094525 ] Nick Pentreath commented on SPARK-21483: Perhaps you can supply some example code for what you're

Re: Setting initial weights of ml.classification.LogisticRegression similar to mllib.classification.LogisticRegressionWithLBFGS

2017-07-20 Thread Nick Pentreath
l > method does that > > On Thu, Jul 20, 2017 at 12:50 PM, Nick Pentreath <nick.pentre...@gmail.com > > wrote: > >> Currently it's not supported, but is on the roadmap: see >> https://issues.apache.org/jira/browse/SPARK-13025 >> >> The most recent attempt

Re: Setting initial weights of ml.classification.LogisticRegression similar to mllib.classification.LogisticRegressionWithLBFGS

2017-07-20 Thread Nick Pentreath
Currently it's not supported, but is on the roadmap: see https://issues.apache.org/jira/browse/SPARK-13025 The most recent attempt is to start with simple linear regression, as here: https://issues.apache.org/jira/browse/SPARK-21386 On Thu, 20 Jul 2017 at 08:36 Aseem Bansal

Re: Regarding Logistic Regression changes in Spark 2.2.0

2017-07-19 Thread Nick Pentreath
L-BFGS is the default optimization method since the initial ML package implementation. The OWLQN variant is used only when L1 regularization is specified (via the elasticNetParam). 2.2 adds the box constraints (optimized using the LBFGS-B variant). So no, no upgrade is required to use L-BFGS - if

[jira] [Created] (SPARK-21469) Add doc and example for FeatureHasher

2017-07-19 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-21469: -- Summary: Add doc and example for FeatureHasher Key: SPARK-21469 URL: https://issues.apache.org/jira/browse/SPARK-21469 Project: Spark Issue Type

[jira] [Created] (SPARK-21468) FeatureHasher Python API

2017-07-19 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-21468: -- Summary: FeatureHasher Python API Key: SPARK-21468 URL: https://issues.apache.org/jira/browse/SPARK-21468 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-21405) Add LBFGS solver for GeneralizedLinearRegression

2017-07-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089386#comment-16089386 ] Nick Pentreath commented on SPARK-21405: Ok, sounds good to me. Do you think we would be able

[jira] [Commented] (SPARK-21388) GBT inherit from HasStepSize & LInearSVC/Binarizer from HasThreshold

2017-07-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087216#comment-16087216 ] Nick Pentreath commented on SPARK-21388: What is the benefit of doing this? > GBT inherit f

[jira] [Commented] (SPARK-21416) Emit Raw Prediction Scores in DecisionTree Classification Model

2017-07-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087161#comment-16087161 ] Nick Pentreath commented on SPARK-21416: This is in Spark 2.2 > Emit Raw Prediction Sco

[jira] [Comment Edited] (SPARK-21416) Emit Raw Prediction Scores in DecisionTree Classification Model

2017-07-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087161#comment-16087161 ] Nick Pentreath edited comment on SPARK-21416 at 7/14/17 10:37 AM

[jira] [Closed] (SPARK-21416) Emit Raw Prediction Scores in DecisionTree Classification Model

2017-07-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-21416. -- Resolution: Not A Problem > Emit Raw Prediction Scores in DecisionTree Classification Mo

[jira] [Commented] (SPARK-21405) Add LBFGS solver for GeneralizedLinearRegression

2017-07-14 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087138#comment-16087138 ] Nick Pentreath commented on SPARK-21405: The only thing we "lose" is the ability t

Re: Spark 2.1.1: A bug in org.apache.spark.ml.linalg.* when using VectorAssembler.scala

2017-07-13 Thread Nick Pentreath
There are Vector classes under ml.linalg package - And VectorAssembler and other feature transformers all work with ml.linalg vectors. If you try to use mllib.linalg vectors instead you will get an error as the user defined type for SQL is not correct On Thu, 13 Jul 2017 at 11:23,

[jira] [Commented] (SPARK-21326) Use TextFileFormat in implementation of LibSVMFileFormat

2017-07-06 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076477#comment-16076477 ] Nick Pentreath commented on SPARK-21326: I think SPARK-21066 for multiple input files would

[jira] [Commented] (SPARK-21306) OneVsRest Conceals Columns That May Be Relevant To Underlying Classifier

2017-07-05 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074404#comment-16074404 ] Nick Pentreath commented on SPARK-21306: This is definitely an issue. I don't think

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-03 Thread Nick Pentreath
+1 (binding) On Mon, 3 Jul 2017 at 11:53 Yanbo Liang wrote: > +1 > > On Mon, Jul 3, 2017 at 5:35 AM, Herman van Hövell tot Westerflier < > hvanhov...@databricks.com> wrote: > >> +1 >> >> On Sun, Jul 2, 2017 at 11:32 PM, Ricardo Almeida < >> ricardo.alme...@actnowib.com>

Re: [PySpark]: How to store NumPy array into single DataFrame cell efficiently

2017-06-28 Thread Nick Pentreath
You will need to use PySpark vectors to store in a DataFrame. They can be created from Numpy arrays as follows: from pyspark.ml.linalg import Vectors df = spark.createDataFrame([("src1", "pkey1", 1, Vectors.dense(np.array([0, 1, 2])))]) On Wed, 28 Jun 2017 at 12:23 Judit Planas

[jira] [Commented] (SPARK-21210) Javadoc 8 fixes for ML shared param traits

2017-06-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063054#comment-16063054 ] Nick Pentreath commented on SPARK-21210: [~srowen] [~hyukjin.kwon] > Javadoc 8 fixes for

[jira] [Updated] (SPARK-21210) Javadoc 8 fixes for ML shared param traits

2017-06-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-21210: --- Description: [PR 15999|https://github.com/apache/spark/pull/15999] included fixes for doc

[jira] [Created] (SPARK-21210) Javadoc 8 fixes for ML shared param traits

2017-06-26 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-21210: -- Summary: Javadoc 8 fixes for ML shared param traits Key: SPARK-21210 URL: https://issues.apache.org/jira/browse/SPARK-21210 Project: Spark Issue Type

[jira] [Commented] (SPARK-21199) Its not possible to impute Vector types

2017-06-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062923#comment-16062923 ] Nick Pentreath commented on SPARK-21199: Can you expand on how the null vectors land up

Re: [VOTE] Apache Spark 2.2.0 (RC5)

2017-06-21 Thread Nick Pentreath
As before, release looks good, all Scala, Python tests pass. R tests fail with same issue in SPARK-21093 but it's not a blocker. +1 (binding) On Wed, 21 Jun 2017 at 01:49 Michael Armbrust wrote: > I will kick off the voting with a +1. > > On Tue, Jun 20, 2017 at 4:49

[jira] [Commented] (SPARK-21093) Multiple gapply execution occasionally failed in SparkR

2017-06-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057550#comment-16057550 ] Nick Pentreath commented on SPARK-21093: Just adding the info from test failure report from

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-21 Thread Nick Pentreath
> structField("Avg", "double")) >>> df4 <- gapply( >>> cols = "Sepal_Length", >>> irisDF, >>> function(key, x) { >>> y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE) >>> }, >>

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Nick Pentreath
he same error reported by Nick below. > > _ > From: Hyukjin Kwon <gurwls...@gmail.com> > Sent: Tuesday, June 13, 2017 8:02 PM > > Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) > To: dev <dev@spark.apache.org> > Cc: Sean Owen <so...@cloudera.c

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-09 Thread Nick Pentreath
All Scala, Python tests pass. ML QA and doc issues are resolved (as well as R it seems). However, I'm seeing the following test failure on R consistently: https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72 On Thu, 8 Jun 2017 at 08:48 Denny Lee wrote: > +1

Re: Question about mllib.recommendation.ALS

2017-06-08 Thread Nick Pentreath
Spark 2.2 will support the recommend-all methods in ML. Also, both ML and MLLIB performance has been greatly improved for the recommend-all methods. Perhaps you could check out the current RC of Spark 2.2 or master branch to try it out? N On Thu, 8 Jun 2017 at 17:18, Sahib Aulakh [Search] ­ <

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Nick Pentreath
Now, on the subject of (ML) QA JIRAs. >From the ML side, I believe they are required (I think others such as Joseph will agree and in fact have already said as much). Most are marked as Blockers, though of those the Python API coverage is strictly not a Blocker as we will never hold the release

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Nick Pentreath
The website updates for ML QA (SPARK-20507) are not *actually* critical as the project website certainly can be updated separately from the source code guide and is not part of the release to be voted on. In future that particular work item for the QA process could be marked down in priority, and

[jira] [Resolved] (SPARK-20499) Spark MLlib, GraphX 2.2 QA umbrella

2017-06-06 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-20499. Resolution: Done > Spark MLlib, GraphX 2.2 QA umbre

[jira] [Resolved] (SPARK-20507) Update MLlib, GraphX websites for 2.2

2017-06-06 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-20507. Resolution: Done Assignee: Nick Pentreath No updates to MLlib project website

[jira] [Commented] (SPARK-20968) Support separator in Tokenizer

2017-06-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034993#comment-16034993 ] Nick Pentreath commented on SPARK-20968: Would you mind adding more detail here? What is the use

[jira] [Commented] (SPARK-16365) Ideas for moving "mllib-local" forward

2017-05-31 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031083#comment-16031083 ] Nick Pentreath commented on SPARK-16365: Databricks has also indicated some work

[jira] [Commented] (SPARK-16365) Ideas for moving "mllib-local" forward

2017-05-31 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030997#comment-16030997 ] Nick Pentreath commented on SPARK-16365: [~akrim] thanks for the thoughts and document. Sorry

Re: RDD MLLib Deprecation Question

2017-05-30 Thread Nick Pentreath
The short answer is those distributed linalg parts will not go away. In the medium term, it's much less likely that the distributed matrix classes will be ported over to DataFrames (though the ideal would be to have DataFrame-backed distributed matrix classes) - given the time and effort it's

[jira] [Commented] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024428#comment-16024428 ] Nick Pentreath commented on SPARK-14174: Yes, as one would expect. But the key result here

[jira] [Closed] (SPARK-6000) Batch K-Means clusters should support "mini-batch" updates

2017-05-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-6000. - Resolution: Duplicate > Batch K-Means clusters should support "mini-batch&

[jira] [Commented] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-05-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023454#comment-16023454 ] Nick Pentreath commented on SPARK-14174: It makes sense. However, I think k=100 is perhaps less

[jira] [Commented] (SPARK-20838) Spark ML ngram feature extractor should support ngram range like scikit

2017-05-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019979#comment-16019979 ] Nick Pentreath commented on SPARK-20838: I think this is a duplicate of SPARK-19668 > Spark

[jira] [Commented] (SPARK-20764) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version

2017-05-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019186#comment-16019186 ] Nick Pentreath commented on SPARK-20764: Please go ahead > Fix visibility discrepa

Re: [VOTE] Apache Spark 2.2.0 (RC2)

2017-05-19 Thread Nick Pentreath
All the outstanding ML QA doc and user guide items are done for 2.2 so from that side we should be good to cut another RC :) On Thu, 18 May 2017 at 00:18 Russell Spitzer wrote: > Seeing an issue with the DataScanExec and some of our integration tests > for the SCC.

[jira] [Resolved] (SPARK-20506) ML, Graph 2.2 QA: Programming guide update and migration guide

2017-05-19 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-20506. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17996 [https

[jira] [Commented] (SPARK-20768) PySpark FPGrowth does not expose numPartitions (expert) param

2017-05-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015455#comment-16015455 ] Nick Pentreath commented on SPARK-20768: Sure - though perhaps [~yuhaoyan] can give an opinion

[jira] [Commented] (SPARK-20768) PySpark FPGrowth does not expose numPartitions (expert) param

2017-05-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015440#comment-16015440 ] Nick Pentreath commented on SPARK-20768: It is there - but not documented as a {{Param}} and so

[jira] [Commented] (SPARK-20506) ML, Graph 2.2 QA: Programming guide update and migration guide

2017-05-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015382#comment-16015382 ] Nick Pentreath commented on SPARK-20506: Oh also SPARK-14503 is important > ML, Graph 2.2

[jira] [Commented] (SPARK-20506) ML, Graph 2.2 QA: Programming guide update and migration guide

2017-05-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015362#comment-16015362 ] Nick Pentreath commented on SPARK-20506: Cool - I've added a section before the Migration Guide

Re: spark ML Recommender program

2017-05-18 Thread Nick Pentreath
It sounds like this may be the same as https://issues.apache.org/jira/browse/SPARK-20402 On Thu, 18 May 2017 at 08:16 Nick Pentreath <nick.pentre...@gmail.com> wrote: > Could you try setting the checkpoint interval for ALS (try 3, 5 say) and > see what the effect is? > > >

Re: spark ML Recommender program

2017-05-18 Thread Nick Pentreath
Could you try setting the checkpoint interval for ALS (try 3, 5 say) and see what the effect is? On Thu, 18 May 2017 at 07:32 Mark Vervuurt wrote: > If you are running locally try increasing driver memory to for example 4G > en executor memory to 3G. > Regards, Mark >

[jira] [Commented] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014047#comment-16014047 ] Nick Pentreath commented on SPARK-14174: [~podongfeng] did you manage to look into some

[jira] [Commented] (SPARK-6000) Batch K-Means clusters should support "mini-batch" updates

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014046#comment-16014046 ] Nick Pentreath commented on SPARK-6000: --- Even though SPARK-14174 is later - it seems there is more

[jira] [Commented] (SPARK-6349) Add probability estimates in SVMModel predict result

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014043#comment-16014043 ] Nick Pentreath commented on SPARK-6349: --- This is now covered by {{ml}}'s {{LinearSVC}}. Shall we

[jira] [Commented] (SPARK-6417) Add Linear Programming algorithm

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014041#comment-16014041 ] Nick Pentreath commented on SPARK-6417: --- I think it's fairly safe to say there is not much bandwidth

[jira] [Closed] (SPARK-6417) Add Linear Programming algorithm

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-6417. - Resolution: Won't Fix > Add Linear Programming algori

[jira] [Commented] (SPARK-7290) Add StringVectorizer

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014039#comment-16014039 ] Nick Pentreath commented on SPARK-7290: --- Is this still desired? Seems it perhaps doesn't add

[jira] [Commented] (SPARK-3181) Add Robust Regression Algorithm with Huber Estimator

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014026#comment-16014026 ] Nick Pentreath commented on SPARK-3181: --- So the Breeze bug is fixed now right? Will this be revived

[jira] [Closed] (SPARK-5328) Update PySpark MLlib NaiveBayes API to take model type parameter for Bernoulli fit

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-5328. - Resolution: Won't Fix > Update PySpark MLlib NaiveBayes API to take model type parame

[jira] [Commented] (SPARK-5328) Update PySpark MLlib NaiveBayes API to take model type parameter for Bernoulli fit

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014005#comment-16014005 ] Nick Pentreath commented on SPARK-5328: --- This is pretty stale so I'll close it off, since it's now

[jira] [Commented] (SPARK-1503) Implement Nesterov's accelerated first-order method

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013999#comment-16013999 ] Nick Pentreath commented on SPARK-1503: --- I think it's safe to say this won't go into Spark core

[jira] [Comment Edited] (SPARK-1503) Implement Nesterov's accelerated first-order method

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013999#comment-16013999 ] Nick Pentreath edited comment on SPARK-1503 at 5/17/17 1:13 PM: I think

[jira] [Commented] (SPARK-1359) SGD implementation is not efficient

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013998#comment-16013998 ] Nick Pentreath commented on SPARK-1359: --- Do we care much about this now, since {{mllib}}'s SGD

[jira] [Closed] (SPARK-12015) Auto convert int to Double when required in pyspark.ml

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath closed SPARK-12015. -- Resolution: Duplicate > Auto convert int to Double when required in pyspark

[jira] [Commented] (SPARK-12015) Auto convert int to Double when required in pyspark.ml

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013996#comment-16013996 ] Nick Pentreath commented on SPARK-12015: This was fixed in SPARK-7425 - closing as duplicate

[jira] [Updated] (SPARK-20723) Random Forest Classifier should expose intermediateRDDStorageLevel similar to ALS

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-20723: --- Target Version/s: (was: 2.3.0) > Random Forest Classifier should exp

[jira] [Updated] (SPARK-20723) Random Forest Classifier should expose intermediateRDDStorageLevel similar to ALS

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-20723: --- Affects Version/s: (was: 2.3.0) 2.2.0 > Random Forest Classif

[jira] [Commented] (SPARK-20723) Random Forest Classifier should expose intermediateRDDStorageLevel similar to ALS

2017-05-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013972#comment-16013972 ] Nick Pentreath commented on SPARK-20723: Please don't set Target Version by the way - committers

[jira] [Commented] (SPARK-20503) ML 2.2 QA: API: Python API coverage

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012214#comment-16012214 ] Nick Pentreath commented on SPARK-20503: Checked all above for doc & user guide consist

[jira] [Resolved] (SPARK-20503) ML 2.2 QA: API: Python API coverage

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-20503. Resolution: Done > ML 2.2 QA: API: Python API cover

[jira] [Created] (SPARK-20768) PySpark FPGrowth does not expose numPartitions (expert) param

2017-05-16 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-20768: -- Summary: PySpark FPGrowth does not expose numPartitions (expert) param Key: SPARK-20768 URL: https://issues.apache.org/jira/browse/SPARK-20768 Project: Spark

[jira] [Reopened] (SPARK-20503) ML 2.2 QA: API: Python API coverage

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reopened SPARK-20503: > ML 2.2 QA: API: Python API cover

[jira] [Commented] (SPARK-20506) ML, Graph 2.2 QA: Programming guide update and migration guide

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012106#comment-16012106 ] Nick Pentreath commented on SPARK-20506: Sent PR for updated migration guide only. I didn't find

[jira] [Comment Edited] (SPARK-20503) ML 2.2 QA: API: Python API coverage

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012101#comment-16012101 ] Nick Pentreath edited comment on SPARK-20503 at 5/16/17 10:21 AM: -- I

[jira] [Reopened] (SPARK-20503) ML 2.2 QA: API: Python API coverage

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reopened SPARK-20503: > ML 2.2 QA: API: Python API cover

[jira] [Resolved] (SPARK-20503) ML 2.2 QA: API: Python API coverage

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-20503. Resolution: Done > ML 2.2 QA: API: Python API cover

[jira] [Resolved] (SPARK-20503) ML 2.2 QA: API: Python API coverage

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-20503. Resolution: Resolved > ML 2.2 QA: API: Python API cover

[jira] [Comment Edited] (SPARK-20503) ML 2.2 QA: API: Python API coverage

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012010#comment-16012010 ] Nick Pentreath edited comment on SPARK-20503 at 5/16/17 10:18 AM

[jira] [Commented] (SPARK-20503) ML 2.2 QA: API: Python API coverage

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012101#comment-16012101 ] Nick Pentreath commented on SPARK-20503: I think I've highlighted all the API gaps in the links

[jira] [Commented] (SPARK-20506) ML, Graph 2.2 QA: Programming guide update and migration guide

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012098#comment-16012098 ] Nick Pentreath commented on SPARK-20506: Hey [~josephkb] [~yanboliang] [~srowen] [~felixcheung

[jira] [Updated] (SPARK-19940) FPGrowthModel.transform should skip duplicated items

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-19940: --- Description: Due to misplaced {{distinct}} {{FPGrowthModel.transform}} generates duplicated

[jira] [Comment Edited] (SPARK-20503) ML 2.2 QA: API: Python API coverage

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012010#comment-16012010 ] Nick Pentreath edited comment on SPARK-20503 at 5/16/17 9:22 AM: - Checked

[jira] [Updated] (SPARK-20764) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-20764: --- Affects Version/s: (was: 2.1.1) 2.2.0 > Fix visibil

[jira] [Updated] (SPARK-20764) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-20764: --- Description: SPARK-20097 exposed {{degreesOfFreedom}} in {{LinearRegressionSummary

[jira] [Created] (SPARK-20764) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version

2017-05-16 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-20764: -- Summary: Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version Key: SPARK-20764 URL: https://issues.apache.org/jira/browse/SPARK

[jira] [Resolved] (SPARK-20677) Clean up ALS recommend all improvement code.

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-20677. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17919 [https

[jira] [Assigned] (SPARK-20553) Update ALS examples for ML to illustrate recommend all

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-20553: -- Assignee: Nick Pentreath > Update ALS examples for ML to illustrate recommend

[jira] [Resolved] (SPARK-20553) Update ALS examples for ML to illustrate recommend all

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-20553. Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17950 [https

[jira] [Commented] (SPARK-20503) ML 2.2 QA: API: Python API coverage

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012010#comment-16012010 ] Nick Pentreath commented on SPARK-20503: Checked: * {{ALS}}: ** {{coldStartStrategy}} param

[jira] [Assigned] (SPARK-20503) ML 2.2 QA: API: Python API coverage

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-20503: -- Assignee: Nick Pentreath > ML 2.2 QA: API: Python API cover

[jira] [Commented] (SPARK-20506) ML, Graph 2.2 QA: Programming guide update and migration guide

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011985#comment-16011985 ] Nick Pentreath commented on SPARK-20506: I'm checking into any other behavior changes that need

[jira] [Commented] (SPARK-20502) ML, Graph 2.2 QA: API: Experimental, DeveloperApi, final, sealed audit

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011976#comment-16011976 ] Nick Pentreath commented on SPARK-20502: Sounds good to me. > ML, Graph 2.2 QA:

[jira] [Assigned] (SPARK-20506) ML, Graph 2.2 QA: Programming guide update and migration guide

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-20506: -- Assignee: Nick Pentreath > ML, Graph 2.2 QA: Programming guide update and migrat

[jira] [Commented] (SPARK-20506) ML, Graph 2.2 QA: Programming guide update and migration guide

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011970#comment-16011970 ] Nick Pentreath commented on SPARK-20506: As per SPARK-20707 no deprecated methods were removed

[jira] [Commented] (SPARK-20506) ML, Graph 2.2 QA: Programming guide update and migration guide

2017-05-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011950#comment-16011950 ] Nick Pentreath commented on SPARK-20506: SPARK-19787 changed the default reg param value

Re: ElasticSearch Spark error

2017-05-15 Thread Nick Pentreath
It may be best to ask on the elasticsearch-Hadoop github project On Mon, 15 May 2017 at 13:19, nayan sharma wrote: > Hi All, > > *ERROR:-* > > *Caused by: org.apache.spark.util.TaskCompletionListenerException: > Connection error (check network and/or proxy settings)-

[jira] [Commented] (SPARK-20711) MultivariateOnlineSummarizer incorrect min/max for identical NaN feature

2017-05-12 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007739#comment-16007739 ] Nick Pentreath commented on SPARK-20711: Shouldn't the stats for any column that contains

<    1   2   3   4   5   6   7   8   9   10   >