Re: MLLib SVMWithSGD is failing for large dataset

2016-07-07 Thread Chitturi Padma
Hi Sarath, By any chance have you resolved this issue ? Thanks, Padma CH On Tue, Apr 28, 2015 at 11:20 PM, sarath [via Apache Spark User List] < ml-node+s1001560n22694...@n3.nabble.com> wrote: > > I am trying to train a large dataset consisting of 8 million data points > and 20 million

Re: Spark work distribution among execs

2016-03-15 Thread Chitturi Padma
By default spark uses 2 executors with one core each, have you allocated more executors using the command line args as - --num-executors 25 --executor-cores x ??? What do you mean by the difference between the nodes is huge ? Regards, Padma Ch On Tue, Mar 15, 2016 at 6:57 PM, bkapukaranov [via

Re: OOM Exception in my spark streaming application

2016-03-14 Thread Chitturi Padma
*Something like below ...* *Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space at org.apache.spark.util.io.ByteArrayChunkOutputStream.allocateNewChunkIfNeeded(ByteArrayChunkOutputStream.scala:66) *

Re: OOM Exception in my spark streaming application

2016-03-14 Thread Chitturi Padma
Hi, Can you please try to show the stack trace line by line, because its bit difficult to read the entire paragraph and make sense out of it . On Mon, Mar 14, 2016 at 3:11 PM, adamreith [via Apache Spark User List] < ml-node+s1001560n26479...@n3.nabble.com> wrote: > Hi, > > I'm using spark

Re: Why KMeans with mllib is so slow ?

2016-03-12 Thread Chitturi Padma
Hi All, I am facing the same issue. taking k values from 60 to 120 incrementing by 10 each time i.e k takes value 60,70,80,...120 the algorithm takes around 2.5h on a 800 MB data set with 38 dimensions. On Sun, Mar 29, 2015 at 9:34 AM, davidshen84 [via Apache Spark User List] <

Re: forgetfulness in clustering algorithm

2016-03-12 Thread Chitturi Padma
Hi, I am interested in the Streaming k-means algorithm and the parameter forgetfulness. Please some one can throw light on this ? On Wed, Jul 29, 2015 at 11:23 AM, AmmarYasir [via Apache Spark User List] < ml-node+s1001560n24050...@n3.nabble.com> wrote: > > I read the post regarding

Re: rdd.collect.foreach() vs rdd.collect.map()

2016-02-24 Thread Chitturi Padma
If you want to do processing in parallel, never use collect or any action such as count or first, they compute the result and bring it back to driver. rdd.map does processing in parallel. Once you have processed rdd then save it to DB. rdd.foreach executes on the workers, Infact, it returns

Re: Restricting number of cores not resulting in reduction in parallelism

2016-02-24 Thread Chitturi Padma
Hi, I didn't get the point that you want to mention i.e "distribute computation across nodes by restricting parallelism on each node". Do you mean per node you are expecting only one task to run ? Can you please paste the configuration changes you made ? On Wed, Feb 24, 2016 at 11:24 PM,

Re: rdd.collect.foreach() vs rdd.collect.map()

2016-02-24 Thread Chitturi Padma
rdd.collect() never does any processing on the workers. It brings the entire rdd as an in-memory collection back to driver On Wed, Feb 24, 2016 at 10:58 PM, Anurag [via Apache Spark User List] < ml-node+s1001560n26320...@n3.nabble.com> wrote: > Hi Everyone > > I am new to Scala and Spark. > > I

Re: Read from kafka after application is restarted

2016-02-23 Thread Chitturi Padma
Hi Vaibhav, As you said, from the second link, I can figure out that, it is not able to cast the class when it is trying to read from checkpoint. Can you try explicit casting like asInstanceOf[T] for the broad casted value ? >From the bug, looks like it affects version 1.5. Try sample

Re: Read from kafka after application is restarted

2016-02-22 Thread Chitturi Padma
Hi Vaibhav, Please try with Kafka direct API approach. Is this not working ? -- Padma Ch On Tue, Feb 23, 2016 at 12:36 AM, vaibhavrtk1 [via Apache Spark User List] < ml-node+s1001560n26291...@n3.nabble.com> wrote: > Hi > > I am using kafka with spark streaming 1.3.0 . When the spark

Re: Does Spark satisfy my requirements?

2016-02-22 Thread Chitturi Padma
Hi, When you say that you want to produce new information, are you looking forward to put the processed data in other consumers ? Spark will be definitely the choice for real-time streaming computations. Are you looking for near-real time processing or exactly real-time processing ? On Sun, Feb

Re: Job Opportunity in London

2015-03-30 Thread Chitturi Padma
Hi, I am interested in this opportunity. I am working as Research Engineer in Impetus Technologies, Bangalore, India. In fact we implemented Distributed Deep Learning on Spark. Will share my CV if you are interested. Please visit the below link:

Re: spark 1.2 compatibility

2015-01-17 Thread Chitturi Padma
Yes. I built spar 1.2 with apache hadoop 2.2. No compatibility issues. On Sat, Jan 17, 2015 at 4:47 AM, bhavyateja [via Apache Spark User List] ml-node+s1001560n21197...@n3.nabble.com wrote: Is spark 1.2 is compatibly with HDP 2.1 -- If you reply to this email,

Re: spark 1.2 compatibility

2015-01-17 Thread Chitturi Padma
and check where I am going wrong. As my word count program is erroring out when using spark 1.2 using YARN but its getting executed using spark 0.9.1 On Sat, Jan 17, 2015 at 5:55 AM, Chitturi Padma [via Apache Spark User List] [hidden email] http:///user/SendEmail.jtp?type=nodenode=21207i=1 wrote

Re: 1gb file processing...task doesn't launch on all the node...Unseen exception

2014-11-20 Thread Chitturi Padma
Hi, I tried with try catch blocks. Infact, inside mapPartitionsWithIndex, method is invoked which does the operation. I put the operations inside the function in try...catch block but thats of no use...still the error persists. Even I commented all the operations and a simple print statement

Re: RandomGenerator class not found exception

2014-11-17 Thread Chitturi Padma
Include the commons-math3/3.3 in class path while submitting jar to spark cluster. Like.. spark-submit --driver-class-path maths3.3jar --class MainClass --master spark cluster url appjar On Mon, Nov 17, 2014 at 1:55 PM, Ritesh Kumar Singh [via Apache Spark User List]

Re: RandomGenerator class not found exception

2014-11-17 Thread Chitturi Padma
(/path/to/jar) within spark-shell and in my project sourcefile It still didn't import the jar at both locations. More Any fixes? Please help On Mon, Nov 17, 2014 at 2:14 PM, Chitturi Padma [hidden email] http://user/SendEmail.jtp?type=nodenode=19073i=0 wrote: Include the commons-math3/3.3

Re: Default spark.deploy.recoveryMode

2014-10-15 Thread Chitturi Padma
which means the details are not persisted and hence any failures in workers and master wouldnt start the daemons normally ..right ? On Wed, Oct 15, 2014 at 12:17 PM, Prashant Sharma [via Apache Spark User List] ml-node+s1001560n16468...@n3.nabble.com wrote: [Removing dev lists] You are

Re: spark.local.dir and spark.worker.dir not used

2014-09-23 Thread Chitturi Padma
Is it possible to view the persisted RDD blocks ? If I use YARN, RDD blocks would be persisted to hdfs then will i be able to read the hdfs blocks as i could do in hadoop ? On Tue, Sep 23, 2014 at 5:56 PM, Shao, Saisai [via Apache Spark User List] ml-node+s1001560n14885...@n3.nabble.com wrote:

Re: spark.local.dir and spark.worker.dir not used

2014-09-23 Thread Chitturi Padma
I couldnt even see the spark-id folder in the default /tmp directory of local.dir. On Tue, Sep 23, 2014 at 6:01 PM, Priya Ch learnings.chitt...@gmail.com wrote: Is it possible to view the persisted RDD blocks ? If I use YARN, RDD blocks would be persisted to hdfs then will i be able

Re: How can I implement eigenvalue decomposition in Spark?

2014-08-08 Thread Chitturi Padma
Hi, I have similar problem. I need matrix operations such as dot product , cross product , transpose, matrix multiplication to be performed on Spark. Does spark has inbuilt API to support these? I see matrix factorization implementation in mlib. On Fri, Aug 8, 2014 at 12:38 PM, yaochunnan [via

Standalone cluster on Windows

2014-07-09 Thread Chitturi Padma
Hi, I wanted to set up standalone cluster on windows machine. But unfortunately, spark-master.cmd file is not available. Can someone suggest how to proceed or is spark-1.0.0 has missed spark-master.cmd file ? -- View this message in context: