Re: UnknownhostException : home

2015-01-19 Thread Rapelly Kartheek
directory. On Mon, Jan 19, 2015 at 9:33 AM, Rapelly Kartheek kartheek.m...@gmail.com wrote: Hi, I get the following exception when I run my application: karthik@karthik:~/spark-1.2.0$ ./bin/spark-submit --class org.apache.spark.examples.SimpleApp001 --deploy-mode client --master spark

UnknownhostException : home

2015-01-19 Thread Rapelly Kartheek
Hi, I get the following exception when I run my application: karthik@karthik:~/spark-1.2.0$ ./bin/spark-submit --class org.apache.spark.examples.SimpleApp001 --deploy-mode client --master spark://karthik:7077 $SPARK_HOME/examples/*/scala-*/spark-examples-*.jar out1.txt log4j:WARN No such

Re: UnknownhostException : home

2015-01-19 Thread Rapelly Kartheek
your local machine, add an entry in your /etc/hosts file like and then run the program again (use sudo to edit the file) 127.0.0.1 home On Mon, Jan 19, 2015 at 3:03 PM, Rapelly Kartheek kartheek.m...@gmail.com wrote: Hi, I get the following exception when I run my application

Re: UnknownhostException : home

2015-01-19 Thread Rapelly Kartheek
; there is an empty host between the 2nd and 3rd. This is true of most URI schemes with a host. On Mon, Jan 19, 2015 at 9:56 AM, Rapelly Kartheek kartheek.m...@gmail.com wrote: Yes yes.. hadoop/etc/hadoop/hdfs-site.xml file has the path like: hdfs://home/... On Mon, Jan 19, 2015 at 3:21 PM, Sean Owen

Re: Problem with building spark-1.2.0

2015-01-12 Thread Rapelly Kartheek
Yes, this proxy problem is resolved. *how your build refers tohttps://github.com/ScrapCodes/sbt-pom-reader.git https://github.com/ScrapCodes/sbt-pom-reader.git I don't see thisrepo the project code base.* I manually downloaded the sbt-pom-reader directory and moved into .sbt/0.13/staging/*/

Re: Problem with building spark-1.2.0

2015-01-04 Thread Rapelly Kartheek
to access github.com for cloning some dependencies as github is blocked in India. What are the other possible ways for this problem?? Thank you! On Sun, Jan 4, 2015 at 9:45 PM, Rapelly Kartheek [hidden email] http:///user/SendEmail.jtp?type=nodenode=20963i=0 wrote: Hi, I get the following error

Spark-1.2.0 build error

2015-01-02 Thread rapelly kartheek
Hi, I get the following error when I build spark using sbt: [error] Nonzero exit code (128): git clone https://github.com/ScrapCodes/sbt-pom-reader.git /home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader [error] Use 'last' for the full log. Any help please?

NullPointerException

2014-12-31 Thread rapelly kartheek
Hi, I get this following Exception when I submit spark application that calculates the frequency of characters in a file. Especially, when I increase the size of data, I face this problem. Exception in thread Thread-47 org.apache.spark.SparkException: Job aborted due to stage failure: Task

Re: NullPointerException

2014-12-31 Thread rapelly kartheek
spark-1.0.0 On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen rosenvi...@gmail.com wrote: Which version of Spark are you using? On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi, I get this following Exception when I submit spark application that calculates

Fwd: NullPointerException

2014-12-31 Thread rapelly kartheek
-- Forwarded message -- From: rapelly kartheek kartheek.m...@gmail.com Date: Thu, Jan 1, 2015 at 12:05 PM Subject: Re: NullPointerException To: Josh Rosen rosenvi...@gmail.com, user@spark.apache.org spark-1.0.0 On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen rosenvi...@gmail.com

Re: NullPointerException

2014-12-31 Thread rapelly kartheek
error? On Wed, Dec 31, 2014 at 10:35 PM, rapelly kartheek kartheek.m...@gmail.com wrote: spark-1.0.0 On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen rosenvi...@gmail.com wrote: Which version of Spark are you using? On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek kartheek.m...@gmail.com

Spark profiler

2014-12-29 Thread rapelly kartheek
Hi, I want to find the time taken for replicating an rdd in spark cluster along with the computation time on the replicated rdd. Can someone please suggest a suitable spark profiler? Thank you

Storage Locations of an rdd

2014-12-26 Thread rapelly kartheek
Hi, I need to find the storage locations (node Ids ) of each partition of a replicated rdd in spark. I mean, if an rdd is replicated twice, I want to find the two nodes for each partition where it is stored. Spark WebUI has a page wherein it depicts the data distribution of each rdd. But, I

Storage Locations of an rdd

2014-12-26 Thread rapelly kartheek
Hi, I need to find the storage locations (node Ids ) of each partition of a replicated rdd in spark. I mean, if an rdd is replicated twice, I want to find the two nodes for each partition where it is stored. Spark WebUI has a page wherein it depicts the data distribution of each rdd. But, I need

Profiling a spark application.

2014-12-25 Thread rapelly kartheek
Hi, I want to find the time taken for replicating an rdd in spark cluster along with the computation time on the replicated rdd. Can someone please suggest some ideas? Thank you

Necessity for rdd replication.

2014-12-03 Thread rapelly kartheek
Hi, I was just thinking about necessity for rdd replication. One category could be something like large number of threads requiring same rdd. Even though, a single rdd can be shared by multiple threads belonging to same application , I believe we can extract better parallelism if the rdd is

Re: java.io.IOException: Filesystem closed

2014-12-02 Thread rapelly kartheek
Regards On Tue, Dec 2, 2014 at 11:59 AM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi, I face the following exception when submit a spark application. The log file shows: 14/12/02 11:52:58 ERROR LiveListenerBus: Listener EventLoggingListener threw an exception java.io.IOException

Re: java.io.IOException: Filesystem closed

2014-12-02 Thread rapelly kartheek
it. Thanks Best Regards On Tue, Dec 2, 2014 at 11:59 AM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi, I face the following exception when submit a spark application. The log file shows: 14/12/02 11:52:58 ERROR LiveListenerBus: Listener EventLoggingListener threw an exception

Re: java.io.IOException: Filesystem closed

2014-12-02 Thread rapelly kartheek
: It could be because those threads are finishing quickly. Thanks Best Regards On Tue, Dec 2, 2014 at 2:19 PM, rapelly kartheek kartheek.m...@gmail.com wrote: But, somehow, if I run this application for the second time, I find that the application gets executed and the results are out

java.io.IOException: Filesystem closed

2014-12-01 Thread rapelly kartheek
Hi, I face the following exception when submit a spark application. The log file shows: 14/12/02 11:52:58 ERROR LiveListenerBus: Listener EventLoggingListener threw an exception java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:689) at

[no subject]

2014-11-26 Thread rapelly kartheek
Hi, I've been fiddling with spark/*/storage/blockManagerMasterActor.getPeers() definition in the context of blockManagerMaster.askDriverWithReply() sending a request GetPeers(). 1) I couldn't understand what the 'selfIndex' is used for?. 2) Also, I tried modifying the 'peers' array by just

How to access application name in the spark framework code.

2014-11-24 Thread rapelly kartheek
Hi, When I submit a spark application like this: ./bin/spark-submit --class org.apache.spark.examples.SparkKMeans --deploy-mode client --master spark://karthik:7077 $SPARK_HOME/examples/*/scala-*/spark-examples-*.jar /k-means 4 0.001 Which part of the spark framework code deals with the name of

Read a HDFS file from Spark using HDFS API

2014-11-14 Thread rapelly kartheek
Hi, I am trying to read a HDFS file from Spark scheduler code. I could find how to write hdfs read/writes in java. But I need to access hdfs from spark using scala. Can someone please help me in this regard.

Re: Read a HDFS file from Spark using HDFS API

2014-11-14 Thread rapelly kartheek
, Tri tri@verizonwireless.com.invalid wrote: It should be val file = sc.textFile(hdfs:///localhost:9000/sigmoid/input.txt) 3 “///” Thanks Tri *From:* rapelly kartheek [mailto:kartheek.m...@gmail.com] *Sent:* Friday, November 14, 2014 9:42 AM *To:* Akhil Das; user

Re: Read a HDFS file from Spark using HDFS API

2014-11-14 Thread rapelly kartheek
Hi Akhil, I face error: not found : value URI On Fri, Nov 14, 2014 at 9:29 PM, rapelly kartheek kartheek.m...@gmail.com wrote: I'll just try out with object Akhil provided. There was no problem working in shell with sc.textFile. Thank you Akhil and Tri. On Fri, Nov 14, 2014 at 9:21 PM

Read a HDFS file from Spark source code

2014-11-11 Thread rapelly kartheek
Hi I am trying to access a file in HDFS from spark source code. Basically, I am tweaking the spark source code. I need to access a file in HDFS from the source code of the spark. I am really not understanding how to go about doing this. Can someone please help me out in this regard. Thank you!!

Re: Read a HDFS file from Spark source code

2014-11-11 Thread rapelly kartheek
at 11:26 AM, Samarth Mailinglist mailinglistsama...@gmail.com wrote: Instead of a file path, use a HDFS URI. For example: (In Python) data = sc.textFile(hdfs://localhost/user/someuser/data) ​ On Wed, Nov 12, 2014 at 10:12 AM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi I am

Rdd replication

2014-11-09 Thread rapelly kartheek
Hi, I am trying to understand rdd replication code. In the process, I frequently execute one spark application whenever I make a change to the code to see effect. My problem is, after a set of repeated executions of the same application, I find that my cluster behaves unusually. Ideally, when

How to convert a non-rdd data to rdd.

2014-10-12 Thread rapelly kartheek
Hi, I am trying to write a String that is not an rdd to HDFS. This data is a variable in Spark Scheduler code. None of the spark File operations are working because my data is not rdd. So, I tried using SparkContext.parallelize(data). But it throws error: [error]

Re: How to convert a non-rdd data to rdd.

2014-10-12 Thread rapelly kartheek
Regards Sanjiv Singh Mob : +091 9990-447-339 On Sun, Oct 12, 2014 at 11:45 AM, rapelly kartheek [hidden email] http://user/SendEmail.jtp?type=nodenode=16231i=0 wrote: Hi, I am trying to write a String that is not an rdd to HDFS. This data is a variable in Spark Scheduler code. None of the spark

Rdd repartitioning

2014-10-10 Thread rapelly kartheek
Hi, I was facing GC overhead errors while executing an application with 570MB data(with rdd replication). In order to fix the heap errors, I repartitioned the rdd to 10: val logData = sc.textFile(hdfs:/text_data/text data.txt).persist(StorageLevel.MEMORY_ONLY_2) val

Re: rsync problem

2014-09-26 Thread rapelly kartheek
Pfeiffer t...@preferred.jp wrote: Hi, I assume you unintentionally did not reply to the list, so I'm adding it back to CC. How do you submit your job to the cluster? Tobias On Thu, Sep 25, 2014 at 2:21 AM, rapelly kartheek kartheek.m...@gmail.com wrote: How do I find out whether

rsync problem

2014-09-19 Thread rapelly kartheek
Hi, I'd made some modifications to the spark source code in the master and reflected them to the slaves using rsync. I followed this command: rsync -avL --progress path/to/spark-1.0.0 username@destinationhostname :path/to/destdirectory. This worked perfectly. But, I wanted to simultaneously

Re: rsync problem

2014-09-19 Thread rapelly kartheek
Hi Tobias, I've copied the files from master to all the slaves. On Fri, Sep 19, 2014 at 1:37 PM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, On Fri, Sep 19, 2014 at 5:02 PM, rapelly kartheek kartheek.m...@gmail.com wrote: This worked perfectly. But, I wanted to simultaneously rsync all

Re: rsync problem

2014-09-19 Thread rapelly kartheek
, * you have copied a lot of files from various hosts to username@slave3:path* only from one node to all the other nodes... On Fri, Sep 19, 2014 at 1:45 PM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi Tobias, I've copied the files from master to all the slaves. On Fri, Sep 19, 2014

Fwd: rsync problem

2014-09-19 Thread rapelly kartheek
-- Forwarded message -- From: rapelly kartheek kartheek.m...@gmail.com Date: Fri, Sep 19, 2014 at 1:51 PM Subject: Re: rsync problem To: Tobias Pfeiffer t...@preferred.jp any idea why the cluster is dying down??? On Fri, Sep 19, 2014 at 1:47 PM, rapelly kartheek kartheek.m

Re: rsync problem

2014-09-19 Thread rapelly kartheek
directory $SPARK_HOME/work is rsynced as well. Try emptying the contents of the work folder on each node and try again. On Fri, Sep 19, 2014 at 4:53 AM, rapelly kartheek kartheek.m...@gmail.com wrote: I * followed this command:rsync -avL --progress path/to/spark-1.0.0 username

File I/O in spark

2014-09-15 Thread rapelly kartheek
Hi I am trying to perform some read/write file operations in spark. Somehow I am neither able to write to a file nor read. import java.io._ val writer = new PrintWriter(new File(test.txt )) writer.write(Hello Scala) Can someone please tell me how to perform file I/O in spark.

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
to make sure the file is accessible on ALL executors. One way to do that is to use a distributed filesystem like HDFS or GlusterFS. On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi I am trying to perform some read/write file operations in spark. Somehow I am

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
The file gets created on the fly. So I dont know how to make sure that its accessible to all nodes. On Mon, Sep 15, 2014 at 10:10 PM, rapelly kartheek kartheek.m...@gmail.com wrote: Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands, I see that the file gets created

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
I came across these APIs in one the scala tutorials over the net. On Mon, Sep 15, 2014 at 10:14 PM, Mohit Jaggi mohitja...@gmail.com wrote: But the above APIs are not for HDFS. On Mon, Sep 15, 2014 at 9:40 AM, rapelly kartheek kartheek.m...@gmail.com wrote: Yes. I have HDFS. My cluster

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
Can you please direct me to the right way of doing this. On Mon, Sep 15, 2014 at 10:18 PM, rapelly kartheek kartheek.m...@gmail.com wrote: I came across these APIs in one the scala tutorials over the net. On Mon, Sep 15, 2014 at 10:14 PM, Mohit Jaggi mohitja...@gmail.com wrote

File operations on spark

2014-09-14 Thread rapelly kartheek
Hi I am trying to perform read/write file operations in spark by creating Writable object. But, I am not able to write to a file. The concerned data is not rdd. Can someone please tell me how to perform read/write file operations on non-rdd data in spark. Regards karthik

compiling spark source code

2014-09-11 Thread rapelly kartheek
HI, Can someone please tell me how to compile the spark source code to effect the changes in the source code. I was trying to ship the jars to all the slaves, but in vain. -Karthik

Re: compiling spark source code

2014-09-11 Thread rapelly kartheek
I have been doing that. All the modifications to the code are not being compiled. On Thu, Sep 11, 2014 at 10:45 PM, Daniil Osipov daniil.osi...@shazam.com wrote: In the spark source folder, execute `sbt/sbt assembly` On Thu, Sep 11, 2014 at 8:27 AM, rapelly kartheek kartheek.m...@gmail.com

How to profile a spark application

2014-09-08 Thread rapelly kartheek
Hi, Can someone tell me how to profile a spark application. -Karthik

Re: How to profile a spark application

2014-09-08 Thread rapelly kartheek
Thank you Ted. regards Karthik On Mon, Sep 8, 2014 at 3:33 PM, Ted Yu yuzhih...@gmail.com wrote: See https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit On Sep 8, 2014, at 2:48 AM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi, Can someone

Re: How to profile a spark application

2014-09-08 Thread rapelly kartheek
hi Ted, Where do I find the licence keys that I need to copy to the licences directory. Thank you!! On Mon, Sep 8, 2014 at 8:25 PM, rapelly kartheek kartheek.m...@gmail.com wrote: Thank you Ted. regards Karthik On Mon, Sep 8, 2014 at 3:33 PM, Ted Yu yuzhih...@gmail.com wrote: See

question on replicate() in blockManager.scala

2014-09-05 Thread rapelly kartheek
Hi, var cachedPeers: Seq[BlockManagerId] = null private def replicate(blockId: String, data: ByteBuffer, level: StorageLevel) { val tLevel = StorageLevel(level.useDisk, level.useMemory, level.deserialized, 1) if (cachedPeers == null) { cachedPeers = master.getPeers(blockManagerId,

replicated rdd storage problem

2014-09-05 Thread rapelly kartheek
Hi, Whenever I replicate an rdd, I find that the rdd gets replicated only in one node. I have a 3 node cluster. I set rdd.persist(StorageLevel.MEMORY_ONLY_2) in my application. The webUI shows that its replicates twice. But, the rdd stogare details show that its replicated only once and only in

Fwd: RDDs

2014-09-04 Thread rapelly kartheek
-- Forwarded message -- From: rapelly kartheek kartheek.m...@gmail.com Date: Thu, Sep 4, 2014 at 11:49 AM Subject: Re: RDDs To: Liu, Raymond raymond@intel.com Thank you Raymond. I am more clear now. So, if an rdd is replicated over multiple nodes (i.e. say two sets of nodes

RDDs

2014-09-03 Thread rapelly kartheek
Hi, Can someone tell me what kind of operations can be performed on a replicated rdd?? What are the use-cases of a replicated rdd. One basic doubt that is bothering me from long time: what is the difference between an application and job in the Spark parlance. I am confused b'cas of Hadoop

operations on replicated RDD

2014-09-01 Thread rapelly kartheek
Hi, An RDD replicated by an application is owned by only that application. No other applications can share it. Then, what is motive behind providing the rdd replication feature. What all oparations can be performed on the replicated RDD. Thank you!!! -karthik

Replicate RDDs

2014-08-27 Thread rapelly kartheek
Hi I have a three node spark cluster. I restricted the resources per application by setting appropriate parameters and I could run two applications simultaneously. Now, I want to replicate an RDD and run two applications simultaneously. Can someone help how to go about doing this!!! I replicated

StorageLevel error.

2014-08-25 Thread rapelly kartheek
Hi, Can someone help me with the following error: scala val rdd = sc.parallelize(Array(1,2,3,4)) rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at console:12 scala rdd.persist(StorageLevel.MEMORY_ONLY) console:15: error: not found: value StorageLevel

Hi

2014-08-20 Thread rapelly kartheek
Hi I have this doubt: I understand that each java process runs on different JVM instances. Now, if I have a single executor on my machine and run several java processes, then there will be several JVM instances running. Now, process_local means, the data is located on the same JVM as the task

Scheduling in spark

2014-07-08 Thread rapelly kartheek
Hi, I am a post graduate student, new to spark. I want to understand how Spark scheduler works. I just have theoretical understanding of DAG scheduler and the underlying task scheduler. I want to know, given a job to the framework, after the DAG scheduler phase, how the scheduling happens??

hi

2014-06-22 Thread rapelly kartheek
Hi Can someone help me with the following error that I faced while setting up single node spark framework. karthik@karthik-OptiPlex-9020:~/spark-1.0.0$ MASTER=spark://localhost:7077 sbin/spark-shell bash: sbin/spark-shell: No such file or directory karthik@karthik-OptiPlex-9020:~/spark-1.0.0$