Re: Sporadic ClassNotFoundException with Kryo

2017-01-12 Thread Nirmal Fernando
I faced a similar issue and had to do two things; 1. Submit Kryo jar with the spark-submit 2. Set spark.executor.userClassPathFirst true in Spark conf On Fri, Nov 18, 2016 at 7:39 PM, chrism wrote: > Regardless of the different ways we have tried deploying a jar

Re: Apply ML to grouped dataframe

2016-08-22 Thread Nirmal Fernando
> Regards. > > [image: Inactive hide details for Nirmal Fernando ---08/23/2016 01:14:48 > PM---Hi Wen, AFAIK Spark MLlib implements its machine learning]Nirmal > Fernando ---08/23/2016 01:14:48 PM---Hi Wen, AFAIK Spark MLlib implements > its machine learning algorithms on top of > &g

Re: Apply ML to grouped dataframe

2016-08-22 Thread Nirmal Fernando
e about how to use MLlib to grouped dataframe? > > Regards. > Wenpei. > > [image: Inactive hide details for Nirmal Fernando ---08/23/2016 10:26:36 > AM---You can use Spark MLlib http://spark.apache.org/docs/late]Nirmal > Fernando ---08/23/2016 10:26:36 AM---You can use Spark MLli

Re: Apply ML to grouped dataframe

2016-08-22 Thread Nirmal Fernando
You can use Spark MLlib http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu wrote: > Hi > > We have a dataframe, then want group it and apply a ML algorithm or > statistics(say t test)

Re: thought experiment: use spark ML to real time prediction

2015-11-12 Thread Nirmal Fernando
essage > From: "Kothuvatiparambil, Viju" <viju.kothuvatiparam...@bankofamerica.com> > > Date: 11/12/2015 3:09 PM (GMT-05:00) > To: DB Tsai <dbt...@dbtsai.com>, Sean Owen <so...@cloudera.com> > Cc: Felix Cheung <felixcheun...@hotmail.com>, Nirmal Fe

Re: thought experiment: use spark ML to real time prediction

2015-11-11 Thread Nirmal Fernando
As of now, we are basically serializing the ML model and then deserialize it for prediction at real time. On Wed, Nov 11, 2015 at 4:39 PM, Adrian Tanase wrote: > I don’t think this answers your question but here’s how you would evaluate > the model in realtime in a streaming

Applying transformations on a JavaRDD using reflection

2015-09-08 Thread Nirmal Fernando
Hi All, I'd like to apply a chain of Spark transformations (map/filter) on a given JavaRDD. I'll have the set of Spark transformations as Function, and even though I can determine the classes of T and A at the runtime, due to the type erasure, I cannot call JavaRDD's transformations as they

Re: Applying transformations on a JavaRDD using reflection

2015-09-08 Thread Nirmal Fernando
Any thoughts? On Tue, Sep 8, 2015 at 3:37 PM, Nirmal Fernando <nir...@wso2.com> wrote: > Hi All, > > I'd like to apply a chain of Spark transformations (map/filter) on a given > JavaRDD. I'll have the set of Spark transformations as Function<T,A>, and > even though

[MLLib][Kmeans] KMeansModel.computeCost takes lot of time

2015-07-13 Thread Nirmal Fernando
Hi, For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of time (16+ mints). It takes lot of time at this task; org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33) org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70) Can this be

Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time

2015-07-13 Thread Nirmal Fernando
are using Java): ``` JavaRDDVector input = data.repartition(8).cache(); org.apache.spark.mllib.clustering.KMeans.train(input.rdd(), 3, 20); ``` On Mon, Jul 13, 2015 at 11:10 AM, Nirmal Fernando nir...@wso2.com wrote: I'm using; org.apache.spark.mllib.clustering.KMeans.train(data.rdd(), 3

Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time

2015-07-13 Thread Nirmal Fernando
Can it be the limited memory causing this slowness? On Tue, Jul 14, 2015 at 9:00 AM, Nirmal Fernando nir...@wso2.com wrote: Thanks Burak. Now it takes minutes to repartition; Active Stages (1) Stage IdDescriptionSubmittedDurationTasks: Succeeded/TotalInputOutputShuffle Read Shuffle Write

Re: How to speed up Spark process

2015-07-13 Thread Nirmal Fernando
If you press on the +details you could see the code that takes time. Did you already check it? On Tue, Jul 14, 2015 at 9:56 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: Job view. Others are fast, but the first one (repartition) is taking 95% of job run time. On Mon, Jul 13, 2015 at 9:23 PM,

Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time

2015-07-13 Thread Nirmal Fernando
setting k=3? What about # of runs? How many partitions do you have? How many cores does your machine have? Thanks, Burak On Mon, Jul 13, 2015 at 10:57 AM, Nirmal Fernando nir...@wso2.com wrote: Hi Burak, k = 3 dimension = 785 features Spark 1.4 On Mon, Jul 13, 2015 at 10:28 PM, Burak Yavuz

Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time

2015-07-13 Thread Nirmal Fernando
, 2015 at 2:53 AM, Nirmal Fernando nir...@wso2.com wrote: Hi, For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of time (16+ mints). It takes lot of time at this task; org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33

Spark MLLib 140 - logistic regression with SGD model accuracy is different in local mode and cluster mode

2015-07-02 Thread Nirmal Fernando
Hi All, I'm facing a quite strange case, where after migrating to Spark 140, I'm seen SparkMLLib produces different results when runs on local mode and cluster mode. Is there any possibility of that happening? (I feel this is an issue in my environment, but just wanted to get confirmed.) Thanks.

Run multiple Spark jobs concurrently

2015-07-01 Thread Nirmal Fernando
Hi All, Is there any additional configs that we have to do to perform $subject? -- Thanks regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/

Re: path to hdfs

2015-06-08 Thread Nirmal Fernando
HDFS path should be something like; hdfs:// 127.0.0.1:8020/user/cloudera/inputs/ On Mon, Jun 8, 2015 at 4:15 PM, Pa Rö paul.roewer1...@googlemail.com wrote: hello, i submit my spark job with the following parameters: ./spark-1.1.0-bin-hadoop2.4/bin/spark-submit \ --class

Is there a way to disable the Spark UI?

2015-02-02 Thread Nirmal Fernando
Hi All, Is there a way to disable the Spark UI? What I really need is to stop the startup of the Jetty server. -- Thanks regards, Nirmal Senior Software Engineer- Platform Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/

Re: Is there a way to disable the Spark UI?

2015-02-02 Thread Nirmal Fernando
Thanks Zhan! Was this introduced from Spark 1.2? or is this available in Spark 1.1 ? On Tue, Feb 3, 2015 at 11:52 AM, Zhan Zhang zzh...@hortonworks.com wrote: You can set spark.ui.enabled to false to disable ui. Thanks. Zhan Zhang On Feb 2, 2015, at 8:06 PM, Nirmal Fernando nir