Re: saveAsTextFile() to save output of Spark program to HDFS

Sudarshan Murty Tue, 05 May 2015 16:29:00 -0700

You are most probably right. I assumed others may have run into this.
When I try to put the files in there, it creates a directory structure with
the part-00000 and part00001 files but these files are of size 0 - no
content. The client error and the server logs have  the error message shown
- which seem to indicate that the system is aware that a datanode exists
but is excluded from the operation. So, it looks like it is not partitioned
and Ambari indicates that HDFS is in good health with one NN, one SN, one
DN.
I am unable to figure out what the issue is.
thanks for your help.


On Tue, May 5, 2015 at 6:39 PM, ayan guha <guha.a...@gmail.com> wrote:

> What happens when you try to put files to your hdfs from local filesystem?
> Looks like its a hdfs issue rather than spark thing.
> On 6 May 2015 05:04, "Sudarshan" <njmu...@gmail.com> wrote:
>
>> I have searched all replies to this question & not found an answer.
>>
>> I am running standalone Spark 1.3.1 and Hortonwork's HDP 2.2 VM, side by 
>> side, on the same machine and trying to write output of wordcount program 
>> into HDFS (works fine writing to a local file, /tmp/wordcount).
>>
>> Only line I added to the wordcount program is: (where 'counts' is the 
>> JavaPairRDD)
>> *counts.saveAsTextFile("hdfs://sandbox.hortonworks.com:8020/tmp/wordcount 
>> <http://sandbox.hortonworks.com:8020/tmp/wordcount>");*
>>
>> When I check in HDFS at that location (/tmp) here's what I find.
>> /tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000000_2/part-00000
>> and
>> /tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001
>>
>> and *both part-000[01] are 0 size files*.
>>
>> The wordcount client output error is:
>> [Stage 1:>                                                          (0 + 2) 
>> / 2]15/05/05 14:40:45 WARN DFSClient: DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
>> /tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001
>>  *could only be replicated to 0 nodes instead of minReplication (=1).  There 
>> are 1 datanode(s) running and 1 node(s) are excluded in this operation.*
>>      at 
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
>>      at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3447)
>>      at 
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:642)
>>
>>
>> I tried this with Spark 1.2.1 same error.
>> I have plenty of space on the DFS.
>> The Name Node, Sec Name Node & the one Data Node are all healthy.
>>
>> Any hint as to what may be the problem ?
>> thanks in advance.
>> Sudarshan
>>
>>
>> ------------------------------
>> View this message in context: saveAsTextFile() to save output of Spark
>> program to HDFS
>> <http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-to-save-output-of-Spark-program-to-HDFS-tp22774.html>
>> Sent from the Apache Spark User List mailing list archive
>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>
>

Re: saveAsTextFile() to save output of Spark program to HDFS

Reply via email to