Re: StreamingContext.textFileStream issue

2015-04-24 Thread Prannoy
Try putting files with different file name and see if the stream is able to detect them. On 25-Apr-2015 3:02 am, Yang Lei [via Apache Spark User List] ml-node+s1001560n22650...@n3.nabble.com wrote: I hit the same issue as if the directory has no files at all when running the sample

Re: Spark RDD Lifecycle: whether RDD will be reclaimed out of scope

2015-04-23 Thread Prannoy
Hi, Yes, Spark automatically removes old RDDs from the cache when you make new ones. Unpersist forces it to remove them right away. On Thu, Apr 23, 2015 at 9:28 AM, Jeffery [via Apache Spark User List] ml-node+s1001560n22618...@n3.nabble.com wrote: Hi, Dear Spark Users/Devs: In a method, I

Re: MEMORY_ONLY vs MEMORY_AND_DISK

2015-03-18 Thread Prannoy
It depends. If the data size on which the calculation is to be done is very large than caching it with MEMORY_AND_DISK is useful. Even in this case MEMORY_AND_DISK is useful if the computation on the RDD is expensive. If the compution is very small than even for large data sets MEMORY_ONLY can be

Re: Unable to read files In Yarn Mode of Spark Streaming ?

2015-03-13 Thread Prannoy
application point of view we need to set any properties ,please help me Thanks Prannoy.. -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-read-files-In-Yarn-Mode

Re: Unable to read files In Yarn Mode of Spark Streaming ?

2015-03-12 Thread Prannoy
Streaming takes only new files into consideration. Add the file after starting the job. On Thu, Mar 12, 2015 at 2:26 PM, CH.KMVPRASAD [via Apache Spark User List] ml-node+s1001560n2201...@n3.nabble.com wrote: yes ! for testing purpose i defined single file in the specified directory

Re: Unable to read files In Yarn Mode of Spark Streaming ?

2015-03-12 Thread Prannoy
Are the files already present in HDFS before you are starting your application ? On Thu, Mar 12, 2015 at 11:11 AM, CH.KMVPRASAD [via Apache Spark User List] ml-node+s1001560n22008...@n3.nabble.com wrote: Hi am successfully executed sparkPi example on yarn mode but i cant able to read files

Re: Spark streaming - tracking/deleting processed files

2015-02-03 Thread Prannoy
Hi, To keep processing the older file also you can use fileStream instead of textFileStream. It has a parameter to specify to look for already present files. For deleting the processed files one way is to get the list of all files in the dStream. This can be done by using the foreachRDD api of

Re: save spark streaming output to single file on hdfs

2015-01-15 Thread Prannoy
Hi, You can use FileUtil.copyMerge API and specify the path to the folder where saveAsTextFile is save the part text file. Suppose your directory is /a/b/c/ use FileUtil.copyMerge(FileSystem of source, a/b/c, FileSystem of destination, Path to the merged file say (a/b/c.txt), true(to delete the

Re: saveAsTextFile

2015-01-15 Thread Prannoy
Hi, Before saving the rdd do a collect to the rdd and print the content of the rdd. Probably its a null value. Thanks. On Sat, Jan 3, 2015 at 5:37 PM, Pankaj Narang [via Apache Spark User List] ml-node+s1001560n20953...@n3.nabble.com wrote: If you can paste the code here I can certainly

Re: Inserting an element in RDD[String]

2015-01-15 Thread Prannoy
Hi, You can take the schema line in another rdd and than do a union of the two rdd . ListString schemaList = new ArrayListString; schemaList.add(xyz); // where xyz is your schema line JavaRDD schemaRDDString = sc.parallize(schemaList) ; //where sc is your sparkcontext JavaRDD newRDDString =

Re: java.io.IOException: Mkdirs failed to create file:/some/path/myapp.csv while using rdd.saveAsTextFile(fileAddress) Spark

2015-01-13 Thread Prannoy
What path you are giving in the saveAsTextFile ?? Can you show the whole line . On Tue, Jan 13, 2015 at 11:42 AM, shekhar [via Apache Spark User List] ml-node+s1001560n21112...@n3.nabble.com wrote: I still i having this issue with rdd.saveAsTextFile() method. thanks, Shekhar reddy

Re: Failed to save RDD as text file to local file system

2015-01-13 Thread Prannoy
manager itself. Thanks. On Mon, Jan 12, 2015 at 9:51 PM, NingjunWang [via Apache Spark User List] ml-node+s1001560n21105...@n3.nabble.com wrote: Prannoy I tried this r.saveAsTextFile(home/cloudera/tmp/out1), it return without error. But where does it saved to? The folder “/home/cloudera/tmp

Re: How to set UI port #?

2015-01-12 Thread Prannoy
Set the port using spconf.set(spark.ui.port,); where, is any port spconf is your spark configuration object. On Sun, Jan 11, 2015 at 2:08 PM, YaoPau [via Apache Spark User List] ml-node+s1001560n21083...@n3.nabble.com wrote: I have multiple Spark Streaming jobs running all day, and

Re: Failed to save RDD as text file to local file system

2015-01-12 Thread Prannoy
Have you tried simple giving the path where you want to save the file ? For instance in your case just do *r.saveAsTextFile(home/cloudera/tmp/out1) * Dont use* file* This will create a folder with name out1. saveAsTextFile always write by making a directory, it does not write data into a

Re: [question]Where can I get the log file

2014-12-04 Thread Prannoy
Hi, You can access your logs in your /spark_home_directory/logs/ directory . cat the file names and you will get the logs. Thanks. On Thu, Dec 4, 2014 at 2:27 PM, FFeng [via Apache Spark User List] ml-node+s1001560n20344...@n3.nabble.com wrote: I have wrote data to spark log. I get it

Re: How can I read an avro file in HDFS in Java?

2014-12-03 Thread Prannoy
Hi, Try using sc.newAPIHadoopFile(hdfs path to your file, AvroSequenceFileInputFormat.class, AvroKey.class, AvroValue.class, your Configuration) You will get the Avro related classes by importing org.apache.avro.* Thanks. On Tue, Dec 2, 2014 at 9:23 PM, leaviva [via Apache Spark User

Re: object xxx is not a member of package com

2014-12-03 Thread Prannoy
Hi, Add the jars in the external library of you related project. Right click on package or class - Build Path - Configure Build Path - Java Build Path - Select the Libraries tab - Add external library - Browse to com.xxx.yyy.zzz._ - ok Clean and build your project, most probably you will be able

Re: How to use FlumeInputDStream in spark cluster?

2014-11-28 Thread Prannoy
Hi, BindException comes when two processes are using the same port. In your spark configuration just set (spark.ui.port,x), to some other port. x can be any number say 12345. BindException will not break your job in either case. Just to fix it change the port number. Thanks. On Fri, Nov

Re: read both local path and HDFS path

2014-11-27 Thread Prannoy
Hi, The configuration you provide is just to access the HDFS when you give an HDFS path. When you provide a HDFS path with the HDFS nameservice, like in your case hmaster155:9000 it goes inside the HDFS to look for the file. For accessing local file just give the local path of the file. Go to the

Re: Persist streams to text files

2014-11-21 Thread Prannoy
Hi , You can use FileUtil.copemerge API and specify the path to the folder where saveAsTextFile is save the part text file. Suppose your directory is /a/b/c/ use FileUtil.copeMerge(FileSystem of source, a/b/c, FileSystem of destination, Path to the merged file say (a/b/c.txt), true(to delete

Re: Cores on Master

2014-11-21 Thread Prannoy
Hi, You can also set the cores in the spark application itself . http://spark.apache.org/docs/1.0.1/spark-standalone.html On Wed, Nov 19, 2014 at 6:11 AM, Pat Ferrel-2 [via Apache Spark User List] ml-node+s1001560n19238...@n3.nabble.com wrote: OK hacking the start-slave.sh did it On Nov

Re: Slow performance in spark streaming

2014-11-21 Thread Prannoy
Hi, Spark runs in local with a speed less than in cluster. Cluster machines usually have a high configuration and also the tasks are distrubuted in workers in order to get a faster result. So you will always find a difference in speed when running in local and when running in cluster. Try running

Re: Parsing a large XML file using Spark

2014-11-21 Thread Prannoy
Hi, Parallel processing of xml files may be an issue due to the tags in the xml file. The xml file has to be intact as while parsing it matches the start and end entity and if its distributed in parts to workers possibly it may or may not find start and end tags within the same worker which will

Re: Execute Spark programs from local machine on Yarn-hadoop cluster

2014-11-21 Thread Prannoy
Hi naveen, I dont think this is possible. If you are setting the master with your cluster details you cannot execute any job from your local machine. You have to execute the jobs inside your yarn machine so that sparkconf is able to connect with all the provided details. If this is not the case

Re: Spark Streaming Application Got killed after 2 hours

2014-11-16 Thread Prannoy
Hi Saj, What is the size of the input data that you are putting on the stream ? Have you tried running the same application with different set of data ? Its weird that exactly after 2 hours the streaming stops. Try running the same application with different data of different size to look if it

Re: saveAsTextFile error

2014-11-15 Thread Prannoy
Hi Niko, Have you tried it running keeping the wordCounts.print() ?? Possibly the import to the package *org.apache.spark.streaming._* is not there so during sbt package it is unable to locate the saveAsTextFile API. Go to