Try putting files with different file name and see if the stream is able to
detect them.
On 25-Apr-2015 3:02 am, Yang Lei [via Apache Spark User List]
ml-node+s1001560n22650...@n3.nabble.com wrote:
I hit the same issue as if the directory has no files at all when
running the sample
Hi,
Yes, Spark automatically removes old RDDs from the cache when you make new
ones. Unpersist forces it to remove them right away.
On Thu, Apr 23, 2015 at 9:28 AM, Jeffery [via Apache Spark User List]
ml-node+s1001560n22618...@n3.nabble.com wrote:
Hi, Dear Spark Users/Devs:
In a method, I
It depends. If the data size on which the calculation is to be done is very
large than caching it with MEMORY_AND_DISK is useful. Even in this
case MEMORY_AND_DISK
is useful if the computation on the RDD is expensive. If the compution is
very small than even for large data sets MEMORY_ONLY can be
application point of view we need to set any
properties ,please help me
Thanks Prannoy..
--
If you reply to this email, your message will be added to the discussion
below:
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-read-files-In-Yarn-Mode
Streaming takes only new files into consideration. Add the file after
starting the job.
On Thu, Mar 12, 2015 at 2:26 PM, CH.KMVPRASAD [via Apache Spark User List]
ml-node+s1001560n2201...@n3.nabble.com wrote:
yes !
for testing purpose i defined single file in the specified directory
Are the files already present in HDFS before you are starting your
application ?
On Thu, Mar 12, 2015 at 11:11 AM, CH.KMVPRASAD [via Apache Spark User List]
ml-node+s1001560n22008...@n3.nabble.com wrote:
Hi am successfully executed sparkPi example on yarn mode but i cant able
to read files
Hi,
To keep processing the older file also you can use fileStream instead of
textFileStream. It has a parameter to specify to look for already present
files.
For deleting the processed files one way is to get the list of all files in
the dStream. This can be done by using the foreachRDD api of
Hi,
You can use FileUtil.copyMerge API and specify the path to the folder where
saveAsTextFile is save the part text file.
Suppose your directory is /a/b/c/
use FileUtil.copyMerge(FileSystem of source, a/b/c, FileSystem of
destination, Path to the merged file say (a/b/c.txt), true(to delete the
Hi,
Before saving the rdd do a collect to the rdd and print the content of the
rdd. Probably its a null value.
Thanks.
On Sat, Jan 3, 2015 at 5:37 PM, Pankaj Narang [via Apache Spark User List]
ml-node+s1001560n20953...@n3.nabble.com wrote:
If you can paste the code here I can certainly
Hi,
You can take the schema line in another rdd and than do a union of the two
rdd .
ListString schemaList = new ArrayListString;
schemaList.add(xyz);
// where xyz is your schema line
JavaRDD schemaRDDString = sc.parallize(schemaList) ;
//where sc is your sparkcontext
JavaRDD newRDDString =
What path you are giving in the saveAsTextFile ?? Can you show the whole
line .
On Tue, Jan 13, 2015 at 11:42 AM, shekhar [via Apache Spark User List]
ml-node+s1001560n21112...@n3.nabble.com wrote:
I still i having this issue with rdd.saveAsTextFile() method.
thanks,
Shekhar reddy
manager itself.
Thanks.
On Mon, Jan 12, 2015 at 9:51 PM, NingjunWang [via Apache Spark User List]
ml-node+s1001560n21105...@n3.nabble.com wrote:
Prannoy
I tried this r.saveAsTextFile(home/cloudera/tmp/out1), it return
without error. But where does it saved to? The folder
“/home/cloudera/tmp
Set the port using
spconf.set(spark.ui.port,);
where, is any port
spconf is your spark configuration object.
On Sun, Jan 11, 2015 at 2:08 PM, YaoPau [via Apache Spark User List]
ml-node+s1001560n21083...@n3.nabble.com wrote:
I have multiple Spark Streaming jobs running all day, and
Have you tried simple giving the path where you want to save the file ?
For instance in your case just do
*r.saveAsTextFile(home/cloudera/tmp/out1) *
Dont use* file*
This will create a folder with name out1. saveAsTextFile always write by
making a directory, it does not write data into a
Hi,
You can access your logs in your /spark_home_directory/logs/ directory .
cat the file names and you will get the logs.
Thanks.
On Thu, Dec 4, 2014 at 2:27 PM, FFeng [via Apache Spark User List]
ml-node+s1001560n20344...@n3.nabble.com wrote:
I have wrote data to spark log.
I get it
Hi,
Try using
sc.newAPIHadoopFile(hdfs path to your file,
AvroSequenceFileInputFormat.class, AvroKey.class, AvroValue.class,
your Configuration)
You will get the Avro related classes by importing org.apache.avro.*
Thanks.
On Tue, Dec 2, 2014 at 9:23 PM, leaviva [via Apache Spark User
Hi,
Add the jars in the external library of you related project.
Right click on package or class - Build Path - Configure Build Path -
Java Build Path - Select the Libraries tab - Add external library -
Browse to com.xxx.yyy.zzz._ - ok
Clean and build your project, most probably you will be able
Hi,
BindException comes when two processes are using the same port. In your
spark configuration just set (spark.ui.port,x),
to some other port. x can be any number say 12345. BindException will
not break your job in either case. Just to fix it change the port number.
Thanks.
On Fri, Nov
Hi,
The configuration you provide is just to access the HDFS when you give an
HDFS path. When you provide a HDFS path with the HDFS nameservice, like in
your case hmaster155:9000 it goes inside the HDFS to look for the file. For
accessing local file just give the local path of the file. Go to the
Hi ,
You can use FileUtil.copemerge API and specify the path to the folder where
saveAsTextFile is save the part text file.
Suppose your directory is /a/b/c/
use FileUtil.copeMerge(FileSystem of source, a/b/c, FileSystem of
destination, Path to the merged file say (a/b/c.txt), true(to delete
Hi,
You can also set the cores in the spark application itself .
http://spark.apache.org/docs/1.0.1/spark-standalone.html
On Wed, Nov 19, 2014 at 6:11 AM, Pat Ferrel-2 [via Apache Spark User List]
ml-node+s1001560n19238...@n3.nabble.com wrote:
OK hacking the start-slave.sh did it
On Nov
Hi,
Spark runs in local with a speed less than in cluster. Cluster machines
usually have a high configuration and also the tasks are distrubuted in
workers in order to get a faster result. So you will always find a
difference in speed when running in local and when running in cluster. Try
running
Hi,
Parallel processing of xml files may be an issue due to the tags in the xml
file. The xml file has to be intact as while parsing it matches the start
and end entity and if its distributed in parts to workers possibly it may
or may not find start and end tags within the same worker which will
Hi naveen,
I dont think this is possible. If you are setting the master with your
cluster details you cannot execute any job from your local machine. You
have to execute the jobs inside your yarn machine so that sparkconf is able
to connect with all the provided details.
If this is not the case
Hi Saj,
What is the size of the input data that you are putting on the stream ?
Have you tried running the same application with different set of data ?
Its weird that exactly after 2 hours the streaming stops. Try running the
same application with different data of different size to look if it
Hi Niko,
Have you tried it running keeping the wordCounts.print() ?? Possibly the
import to the package *org.apache.spark.streaming._* is not there so during
sbt package it is unable to locate the saveAsTextFile API.
Go to
26 matches
Mail list logo