Why don't you just map rdd's rows to lines and then call saveAsTextFile()?

On 3.2.2015. 11:15, Hafiz Mujadid wrote:
I want to write whole schemardd to single in hdfs but facing following
exception

rg.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /test/data/data1.csv (inode 402042): File does not exist. Holder
DFSClient_NONMAPREDUCE_-564238432_57 does not have any open files

here is my code
rdd.foreachPartition( iterator => {
       var output = new Path( outputpath )
       val fs = FileSystem.get( new Configuration() )
       var writer : BufferedWriter = null
       writer = new BufferedWriter( new OutputStreamWriter(  fs.create(
output ) ) )
       var line = new StringBuilder
       iterator.foreach( row => {
                        row.foreach( column => {
                        line.append( column.toString + splitter )
                        } )
                        writer.write( line.toString.dropRight( 1 ) )
                        writer.newLine()
                        line.clear
        } )
        writer.close()
} )

I think problem is that I am making writer for each partition and multiple
writer are executing in parallel so when they try to write to same file then
this problem appears.
When I avoid this approach then I face task not serializable exception

Any suggest to handle this problem?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/LeaseExpiredException-while-writing-schemardd-to-hdfs-tp21477.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to