saveAsTextFile() part- files are missing

2015-05-21 Thread rroxanaioana
Hello!
I just started with Spark. I have an application which counts words in a
file (1 MB file). 
The file is stored locally. I loaded the file using native code and then
created the RDD from it.
   
JavaRDD rddFromFile = context.parallelize(myFile,
2);
JavaRDD words = rddFromFile.flatMap(...);
JavaPairRDD pairs = words.mapToPair(...);
JavaPairRDD counter = pairs.reduceByKey(..);

counter.saveAsTextFile("file:///root/output");
context.close();

I have one master and 2 slaves. I run the program from the master node.
The output directory is created on the master node and on the 2 nodes. On
the master node I have only one file _SUCCES (empty) and on the nodes I have
_temporary file. I printed the counter at the console, the result seems ok.
What am I doing wrong?
Thank you!





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-part-files-are-missing-tp22974.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: saveAsTextFile() part- files are missing

2015-05-21 Thread Tomasz Fruboes

Hi,

 it looks you are writing to a local filesystem. Could you try writing 
to a location visible by all nodes (master and workers), e.g. nfs share?


 HTH,
  Tomasz

W dniu 21.05.2015 o 17:16, rroxanaioana pisze:

Hello!
I just started with Spark. I have an application which counts words in a
file (1 MB file).
The file is stored locally. I loaded the file using native code and then
created the RDD from it.

 JavaRDD rddFromFile = context.parallelize(myFile,
2);
JavaRDD words = rddFromFile.flatMap(...);
JavaPairRDD pairs = words.mapToPair(...);
JavaPairRDD counter = pairs.reduceByKey(..);

counter.saveAsTextFile("file:///root/output");
context.close();

I have one master and 2 slaves. I run the program from the master node.
The output directory is created on the master node and on the 2 nodes. On
the master node I have only one file _SUCCES (empty) and on the nodes I have
_temporary file. I printed the counter at the console, the result seems ok.
What am I doing wrong?
Thank you!





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-part-files-are-missing-tp22974.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org