Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

2016-08-10 Thread Rohit Chaddha
t; no target variable. > > PCA will not 'improve' clustering per se but can make it faster. > You may want to specify what you are actually trying to optimize. > > > On Tue, Aug 9, 2016, 03:23 Rohit Chaddha <rohitchaddha1...@gmail.com> > wrote: > >> I would rather have less feat

Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

2016-08-08 Thread Rohit Chaddha
you classification, you can then run your > model again with the smaller set of features. > The two approaches are quite different, what I'm suggesting involves > training (supervised learning) in the context of a target function, with > SVD you are doing unsupervised learning. > > O

Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

2016-08-08 Thread Rohit Chaddha
> >> I know we can reduce dimensions by using PCA, but i think that does not > >> allow us to understand which factors from the original are we using in > the > >> end. > >> > >> - Tony L. > >> > >> On Mon, Aug 8, 2016 at 5:12 P

Machine learning question (suing spark)- removing redundant factors while doing clustering

2016-08-08 Thread Rohit Chaddha
I have a data-set where each data-point has 112 factors. I want to remove the factors which are not relevant, and say reduce to 20 factors out of these 112 and then do clustering of data-points using these 20 factors. How do I do these and how do I figure out which of the 20 factors are useful

Calling KmeansModel predict method

2016-08-03 Thread Rohit Chaddha
The predict method takes a Vector object I am unable to figure out how to make this spark vector object for getting predictions from my model. Does anyone has some code in java for this ? Thanks Rohit

build error - failing test- Error while building spark 2.0 trunk from github

2016-07-31 Thread Rohit Chaddha
--- T E S T S --- Running org.apache.spark.api.java.OptionalSuite Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.052 sec - in org.apache.spark.api.java.OptionalSuite Running

calling dataset.show on a custom object - displays toString() value as first column and blank for rest

2016-07-31 Thread Rohit Chaddha
I have a custom object called A and corresponding Dataset when I call datasetA.show() method i get the following +++-+-+---+ |id|da|like|values|uid| +++-+-+---+ |A.toString()...|

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
Jul 28, 2016 at 11:52 AM, Rohit Chaddha > <rohitchaddha1...@gmail.com> wrote: > > Sean, > > > > I saw some JIRA tickets and looks like this is still an open bug (rather > > than an improvement as marked in JIRA). > > > > https://issues.apache.org/jira/bro

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
On Fri, Jul 29, 2016 at 12:06 AM, Rohit Chaddha <rohitchaddha1...@gmail.com> wrote: > I am simply trying to do > session.read().json("file:///C:/data/a.json"); > > in 2.0.0-preview it was working fine with > sqlContext.read().json("C:/data/a.json"); > >

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
t work? that should certainly be an absolute > URI with an absolute path. What exactly is your input value for this > property? > > On Thu, Jul 28, 2016 at 11:28 AM, Rohit Chaddha > <rohitchaddha1...@gmail.com> wrote: > > Hello Sean, > > > > I have tried both f

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
Thu, Jul 28, 2016 at 10:47 AM, Rohit Chaddha > <rohitchaddha1...@gmail.com> wrote: > > I upgraded from 2.0.0-preview to 2.0.0 > > and I started getting the following error > > > > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > >

Re: ClassTag variable in broadcast in spark 2.0 ? how to use

2016-07-28 Thread Rohit Chaddha
My bad. Please ignore this question. I accidentally reverted to sparkContext causing the issue On Thu, Jul 28, 2016 at 11:36 PM, Rohit Chaddha <rohitchaddha1...@gmail.com> wrote: > In spark 2.0 there is an addtional parameter of type ClassTag in the > broadcast method of the

ClassTag variable in broadcast in spark 2.0 ? how to use

2016-07-28 Thread Rohit Chaddha
In spark 2.0 there is an addtional parameter of type ClassTag in the broadcast method of the sparkContext What is this variable and how to do broadcast now? here is my exisitng code with 2.0.0-preview Broadcast> b = jsc.broadcast(u.collectAsMap()); what changes needs to be

Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
I upgraded from 2.0.0-preview to 2.0.0 and I started getting the following error Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:C:/ibm/spark-warehouse Any ideas how to fix this -Rohit

Is RowMatrix missing in org.apache.spark.ml package?

2016-07-26 Thread Rohit Chaddha
It is present in mlib but I don't seem to find it in ml package. Any suggestions please ? -Rohit

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-25 Thread Rohit Chaddha
Hi Krishna, Great .. I had no idea about this. I tried your suggestion by using na.drop() and got a rmse = 1.5794048211812495 Any suggestions how this can be reduced and the model improved ? Regards, Rohit On Mon, Jul 25, 2016 at 4:12 AM, Krishna Sankar wrote: > Thanks

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-24 Thread Rohit Chaddha
Great thanks both of you. I was struggling with this issue as well. -Rohit On Mon, Jul 25, 2016 at 4:12 AM, Krishna Sankar wrote: > Thanks Nick. I also ran into this issue. > VG, One workaround is to drop the NaN from predictions (df.na.drop()) and > then use the dataset