Re: RDDs

2015-03-03 Thread Kartheek.R
Hi TD, You can always run two jobs on the same cached RDD, and they can run in parallel (assuming you launch the 2 jobs from two different threads) Is this a correct way to launch jobs from two different threads? val threadA = new Thread(new Runnable { def run() { for(i- 0 until

Job submission via multiple threads

2015-02-26 Thread Kartheek.R
Hi, I just wrote an application that intends to submit its actions(jobs) via independent threads keeping in view of the point: Second, within each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads, mentioned in:

Task not serializable exception

2015-02-24 Thread Kartheek.R
Hi, I run into Task not Serializable excption with following code below. When I remove the threads and run, it works, but with threads I run into Task not serializable exception. object SparkKart extends Serializable{ def parseVector(line: String): Vector[Double] = { DenseVector(line.split('

Re: Task not serializable exception

2015-02-23 Thread Kartheek.R
I could trace where the problem is. If I run without any threads, it works fine. When I allocate threads, I run into Not serializable problem. But, I need to have threads in my code. Any help please!!! This is my code: object SparkKart { def parseVector(line: String): Vector[Double] = {

Task not serializable exception

2015-02-23 Thread Kartheek.R
Hi, I have a file containig data in the following way: 0.0 0.0 0.0 0.1 0.1 0.1 0.2 0.2 0.2 9.0 9.0 9.0 9.1 9.1 9.1 9.2 9.2 9.2 Now I do the folloowing: val kPoints = data.takeSample(withReplacement = false, 4, 42).toArray val thread1= new Thread(new Runnable { def run() {

Re: java.io.IOException: Filesystem closed

2015-02-21 Thread Kartheek.R
Are you replicating any RDDs? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-Filesystem-closed-tp20150p21749.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Caching RDD

2015-02-19 Thread Kartheek.R
Hi, I have HDFS file of size 598MB. I create RDD over this file and cache it in RAM in a 7 node cluster with 2G RAM each. I find that each partition gets replicated thrice or even 4 times in the cluster even without me specifying in code. Total partitions are 5 for the RDD created but cached

Inconsistent execution times for same application.

2015-02-15 Thread Kartheek.R
Hi, My spark cluster contains machines like Pentium-4, dual core and quad-core machines. I am trying to run a character frequency count application. The application contains several threads, each submitting a job(action) that counts the frequency of a single character. But, my problem is, I get

Need a spark application.

2015-02-09 Thread Kartheek.R
Hi, Can someone please suggest some real life application implemented in spark ( things like gene sequencing) that is of type below code. Basically, the application should have jobs submitted via as many threads as possible. I need similar kind of spark application for benchmarking. val

Question about recomputing lost partition of rdd ?

2015-02-06 Thread Kartheek.R
Hi, I have this doubt: Assume that an rdd is stored across multiple nodes and one of the nodes fails. So, a partition is lost. Now, I know that when this node is back, it uses the lineage from its neighbours and recomputes that partition alone. 1) How does it get the source data (original data

Connection closed/reset by peers error

2015-02-01 Thread Kartheek.R
Hi, I keep facing this error when I run my application: java.io.IOException: Connection from s1/- closed +details java.io.IOException: Connection from s1/:43741 closed at

Re: java.io.IOException: connection closed.

2015-01-24 Thread Kartheek.R
When I increase the executor.memory size, I run it smoothly without any errors. On Sat, Jan 24, 2015 at 9:29 PM, Rapelly Kartheek kartheek.m...@gmail.com wrote: Hi, While running spark application, I get the following Exception leading to several failed stages. Exception in thread

java.io.IOException: connection closed.

2015-01-24 Thread Kartheek.R
Hi, While running spark application, I get the following Exception leading to several failed stages. Exception in thread Thread-46 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0 (TID 262,

Fwd: UnknownhostException : home

2015-01-19 Thread Kartheek.R
-- Forwarded message -- From: Rapelly Kartheek kartheek.m...@gmail.com Date: Mon, Jan 19, 2015 at 3:03 PM Subject: UnknownhostException : home To: user@spark.apache.org user@spark.apache.org Hi, I get the following exception when I run my application:

Re: Problem with building spark-1.2.0

2015-01-12 Thread Kartheek.R
Hi, This is what I am trying to do: karthik@s4:~/spark-1.2.0$ SPARK_HADOOP_VERSION=2.3.0 sbt/sbt clean Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. [info] Loading project definition from /home/karthik/spark-1.2.0/project/project

Problem with building spark-1.2.0

2015-01-04 Thread Kartheek.R
Hi, I get the following error when I build spark-1.2.0 using sbt: [error] Nonzero exit code (128): git clone https://github.com/ScrapCodes/sbt-pom-reader.git /home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader [error] Use 'last' for the full log. Any help please? Thanks --

Re: Problem with building spark-1.2.0

2015-01-04 Thread Kartheek.R
The problem is that my network is not able to access github.com for cloning some dependencies as github is blocked in India. What are the other possible ways for this problem?? Thank you! On Sun, Jan 4, 2015 at 9:45 PM, Rapelly Kartheek kartheek.m...@gmail.com wrote: Hi, I get the following

Re: How to convert a non-rdd data to rdd.

2014-10-12 Thread Kartheek.R
Hi Sean, I tried even with sc as: sc.parallelize(data). But. I get the error: value sc not found. On Sun, Oct 12, 2014 at 1:47 PM, sowen [via Apache Spark User List] ml-node+s1001560n16233...@n3.nabble.com wrote: It is a method of the class, not a static method of the object. Since a

Re: How to convert a non-rdd data to rdd.

2014-10-12 Thread Kartheek.R
Does SparkContext exists when this part (AskDriverWithReply()) of the scheduler code gets executed? On Sun, Oct 12, 2014 at 1:54 PM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi Sean, I tried even with sc as: sc.parallelize(data). But. I get the error: value sc not found. On Sun, Oct

Re: RDDs

2014-09-04 Thread Kartheek.R
Thank you yuanbosoft. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13444.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe,

RE: RDDs

2014-09-03 Thread Kartheek.R
Thank you Raymond and Tobias. Yeah, I am very clear about what I was asking. I was talking about replicated rdd only. Now that I've got my understanding about job and application validated, I wanted to know if we can replicate an rdd and run two jobs (that need same rdd) of an application in

Re: Scheduling in spark

2014-07-14 Thread Kartheek.R
Thank you so much for the link, Sujeet. regards Karthik -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scheduling-in-spark-tp9035p9716.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Scheduling in spark

2014-07-14 Thread Kartheek.R
Thank you Andrew for the updated link. regards Karthik -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scheduling-in-spark-tp9035p9717.html Sent from the Apache Spark User List mailing list archive at Nabble.com.