Why don't you just map rdd's rows to lines and then call saveAsTextFile()?
On 3.2.2015. 11:15, Hafiz Mujadid wrote:
I want to write whole schemardd to single in hdfs but facing following
exception
rg.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /test/data/data1.csv (inode 402042): File does not exist. Holder
DFSClient_NONMAPREDUCE_-564238432_57 does not have any open files
here is my code
rdd.foreachPartition( iterator = {
var output = new Path( outputpath )
val fs = FileSystem.get( new Configuration() )
var writer : BufferedWriter = null
writer = new BufferedWriter( new OutputStreamWriter( fs.create(
output ) ) )
var line = new StringBuilder
iterator.foreach( row = {
row.foreach( column = {
line.append( column.toString + splitter )
} )
writer.write( line.toString.dropRight( 1 ) )
writer.newLine()
line.clear
} )
writer.close()
} )
I think problem is that I am making writer for each partition and multiple
writer are executing in parallel so when they try to write to same file then
this problem appears.
When I avoid this approach then I face task not serializable exception
Any suggest to handle this problem?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/LeaseExpiredException-while-writing-schemardd-to-hdfs-tp21477.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org