File I/O in spark

2014-09-15 Thread rapelly kartheek
Hi

I am trying to perform some read/write file operations in spark. Somehow I
am neither able to write to a file nor read.

import java.io._

  val writer = new PrintWriter(new File(test.txt ))

  writer.write(Hello Scala)


Can someone please tell me how to perform file I/O in spark.


Re: File I/O in spark

2014-09-15 Thread Mohit Jaggi
Is this code running in an executor? You need to make sure the file is
accessible on ALL executors. One way to do that is to use a distributed
filesystem like HDFS or GlusterFS.

On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 Hi

 I am trying to perform some read/write file operations in spark. Somehow I
 am neither able to write to a file nor read.

 import java.io._

   val writer = new PrintWriter(new File(test.txt ))

   writer.write(Hello Scala)


 Can someone please tell me how to perform file I/O in spark.




Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands, I
see that the file gets created in the master node. But, there wont be any
data written to it.


On Mon, Sep 15, 2014 at 10:06 PM, Mohit Jaggi mohitja...@gmail.com wrote:

 Is this code running in an executor? You need to make sure the file is
 accessible on ALL executors. One way to do that is to use a distributed
 filesystem like HDFS or GlusterFS.

 On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek kartheek.m...@gmail.com
  wrote:

 Hi

 I am trying to perform some read/write file operations in spark. Somehow
 I am neither able to write to a file nor read.

 import java.io._

   val writer = new PrintWriter(new File(test.txt ))

   writer.write(Hello Scala)


 Can someone please tell me how to perform file I/O in spark.





Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
The file gets created on the fly. So I dont know how to make sure that its
accessible to all nodes.

On Mon, Sep 15, 2014 at 10:10 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands, I
 see that the file gets created in the master node. But, there wont be any
 data written to it.


 On Mon, Sep 15, 2014 at 10:06 PM, Mohit Jaggi mohitja...@gmail.com
 wrote:

 Is this code running in an executor? You need to make sure the file is
 accessible on ALL executors. One way to do that is to use a distributed
 filesystem like HDFS or GlusterFS.

 On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi

 I am trying to perform some read/write file operations in spark. Somehow
 I am neither able to write to a file nor read.

 import java.io._

   val writer = new PrintWriter(new File(test.txt ))

   writer.write(Hello Scala)


 Can someone please tell me how to perform file I/O in spark.






Re: File I/O in spark

2014-09-15 Thread Mohit Jaggi
But the above APIs are not for HDFS.

On Mon, Sep 15, 2014 at 9:40 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands, I
 see that the file gets created in the master node. But, there wont be any
 data written to it.


 On Mon, Sep 15, 2014 at 10:06 PM, Mohit Jaggi mohitja...@gmail.com
 wrote:

 Is this code running in an executor? You need to make sure the file is
 accessible on ALL executors. One way to do that is to use a distributed
 filesystem like HDFS or GlusterFS.

 On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi

 I am trying to perform some read/write file operations in spark. Somehow
 I am neither able to write to a file nor read.

 import java.io._

   val writer = new PrintWriter(new File(test.txt ))

   writer.write(Hello Scala)


 Can someone please tell me how to perform file I/O in spark.






Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
I came across these APIs in one the scala tutorials over the net.

On Mon, Sep 15, 2014 at 10:14 PM, Mohit Jaggi mohitja...@gmail.com wrote:

 But the above APIs are not for HDFS.

 On Mon, Sep 15, 2014 at 9:40 AM, rapelly kartheek kartheek.m...@gmail.com
  wrote:

 Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands,
 I see that the file gets created in the master node. But, there wont be any
 data written to it.


 On Mon, Sep 15, 2014 at 10:06 PM, Mohit Jaggi mohitja...@gmail.com
 wrote:

 Is this code running in an executor? You need to make sure the file is
 accessible on ALL executors. One way to do that is to use a distributed
 filesystem like HDFS or GlusterFS.

 On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi

 I am trying to perform some read/write file operations in spark.
 Somehow I am neither able to write to a file nor read.

 import java.io._

   val writer = new PrintWriter(new File(test.txt ))

   writer.write(Hello Scala)


 Can someone please tell me how to perform file I/O in spark.







Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
Can you please direct me to the right way of doing this.

On Mon, Sep 15, 2014 at 10:18 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 I came across these APIs in one the scala tutorials over the net.

 On Mon, Sep 15, 2014 at 10:14 PM, Mohit Jaggi mohitja...@gmail.com
 wrote:

 But the above APIs are not for HDFS.

 On Mon, Sep 15, 2014 at 9:40 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands,
 I see that the file gets created in the master node. But, there wont be any
 data written to it.


 On Mon, Sep 15, 2014 at 10:06 PM, Mohit Jaggi mohitja...@gmail.com
 wrote:

 Is this code running in an executor? You need to make sure the file is
 accessible on ALL executors. One way to do that is to use a distributed
 filesystem like HDFS or GlusterFS.

 On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi

 I am trying to perform some read/write file operations in spark.
 Somehow I am neither able to write to a file nor read.

 import java.io._

   val writer = new PrintWriter(new File(test.txt ))

   writer.write(Hello Scala)


 Can someone please tell me how to perform file I/O in spark.








Re: File I/O in spark

2014-09-15 Thread Mohit Jaggi
If you underlying filesystem is HDFS, you need to use HDFS APIs. A google
search brought up this link which appears reasonable.

http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample

If you want to use java.io APIs, you have to make sure your filesystem is
accessible from all nodes in your cluster. You did not mention what errors
you get with your code. They may mean something.


On Mon, Sep 15, 2014 at 9:51 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 Can you please direct me to the right way of doing this.

 On Mon, Sep 15, 2014 at 10:18 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 I came across these APIs in one the scala tutorials over the net.

 On Mon, Sep 15, 2014 at 10:14 PM, Mohit Jaggi mohitja...@gmail.com
 wrote:

 But the above APIs are not for HDFS.

 On Mon, Sep 15, 2014 at 9:40 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Yes. I have HDFS. My cluster has 5 nodes. When I run the above
 commands, I see that the file gets created in the master node. But, there
 wont be any data written to it.


 On Mon, Sep 15, 2014 at 10:06 PM, Mohit Jaggi mohitja...@gmail.com
 wrote:

 Is this code running in an executor? You need to make sure the file is
 accessible on ALL executors. One way to do that is to use a distributed
 filesystem like HDFS or GlusterFS.

 On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi

 I am trying to perform some read/write file operations in spark.
 Somehow I am neither able to write to a file nor read.

 import java.io._

   val writer = new PrintWriter(new File(test.txt ))

   writer.write(Hello Scala)


 Can someone please tell me how to perform file I/O in spark.









Re: File I/O in spark

2014-09-15 Thread Frank Austin Nothaft
Kartheek,

What exactly are you trying to do? Those APIs are only for local file access.

If you want to access data in HDFS, you’ll want to use one of the reader 
methods in org.apache.spark.SparkContext which will give you an RDD (e.g., 
newAPIHadoopFile, sequenceFile, or textFile). If you want to write data to 
HDFS, you’ll need to have an RDD and use one of the functions in 
org.apache.spark.RDD (saveAsObjectFile or saveAsTextFile) or one of the 
PairRDDFunctions (e.g., saveAsNewAPIHadoopFile or saveAsNewAPIHadoopDataset).

The Scala/Java IO system can be used inside of Spark, but only for local file 
access. This is used to implement the rdd.pipe method (IIRC), and we use it in 
some downstream apps to do IO with processes that we spawn from mapPartitions 
calls (see here and here).

Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

On Sep 15, 2014, at 9:44 AM, rapelly kartheek kartheek.m...@gmail.com wrote:

 The file gets created on the fly. So I dont know how to make sure that its 
 accessible to all nodes.
 
 On Mon, Sep 15, 2014 at 10:10 PM, rapelly kartheek kartheek.m...@gmail.com 
 wrote:
 Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands, I 
 see that the file gets created in the master node. But, there wont be any 
 data written to it.
 
 
 On Mon, Sep 15, 2014 at 10:06 PM, Mohit Jaggi mohitja...@gmail.com wrote:
 Is this code running in an executor? You need to make sure the file is 
 accessible on ALL executors. One way to do that is to use a distributed 
 filesystem like HDFS or GlusterFS.
 
 On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek kartheek.m...@gmail.com 
 wrote:
 Hi 
 
 I am trying to perform some read/write file operations in spark. Somehow I am 
 neither able to write to a file nor read.
 
 import java.io._
   val writer = new PrintWriter(new File(test.txt ))
   writer.write(Hello Scala)
 
 Can someone please tell me how to perform file I/O in spark.