Re: FSDataOutputStream hangs in out.close()

Harsh J Wed, 27 Mar 2013 10:56:10 -0700

Same data does not mean same block IDs across two clusters. I'm
guessing this is cause of some issue in your code when wanting to
write to two different HDFS instances with the same client. Did you do
a low level mod for HDFS writes as well or just create two different
FS instances when you want to write to different ones?


On Wed, Mar 27, 2013 at 9:34 PM, Pedro Sá da Costa <psdc1...@gmail.com> wrote:
> I can add this information taken from the datanode logs, but it seems
> something related to blocks:
>
> nfoPort=50075, ipcPort=50020):Got exception while serving
> blk_-4664365259588027316_2050 to /XXX.XXX.XXX.123:
> java.io.IOException: Block blk_-4664365259588027316_2050 is not valid.
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:1072)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:1035)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:1045)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
>         at java.lang.Thread.run(Thread.java:662)
>
> 2013-03-27 15:44:54,965 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(XXX.XXX.XXX.123:50010,
> storageID=DS-595468034-XXX.XXX.XXX.123-50010-1364122596021, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.io.IOException: Block blk_-4664365259588027316_2050 is not valid.
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:1072)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:1035)
>         at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:1045)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
>         at java.lang.Thread.run(Thread.java:662)
>
> I still have no idea why this error, if the 2 HDFS instances have the same
> data.
>
>
> On 27 March 2013 15:53, Pedro Sá da Costa <psdc1...@gmail.com> wrote:
>>
>> Hi,
>>
>> I'm trying to make the same client to talk to different HDFS and JT
>> instances that are in different sites of Amazon EC2. The error that I got
>> is:
>>
>>  java.io.IOException: Got error for OP_READ_BLOCK,
>> self=/XXX.XXX.XXX.123:44734,
>>
>> remote=ip-XXX-XXX-XXX-123.eu-west-1.compute.internal/XXX.XXX.XXX.123:50010,
>> for file
>>
>> ip-XXX-XXX-XXX-123.eu-west-1.compute.internal/XXX.XXX.XXX.123:50010:-4664365259588027316,
>> for block
>>    -4664365259588027316_2050
>>
>> This error means than it wasn't possible to write on a remote host?
>>
>>
>>
>>
>>
>> On 27 March 2013 12:24, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> You can try to take a jstack stack trace and see what its hung on.
>>> I've only ever noticed a close() hang when the NN does not accept the
>>> complete-file call (due to minimum replication not being guaranteed),
>>> but given your changes (which I haven't an idea about yet) it could be
>>> something else as well. You're essentially trying to make the same
>>> client talk to two different FSes I think (aside of the JT RPC).
>>>
>>> On Wed, Mar 27, 2013 at 5:50 PM, Pedro Sá da Costa <psdc1...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > I'm using the Hadoop 1.0.4 API to try to submit a job in a remote
>>> > JobTracker. I created modfied the JobClient to submit the same job in
>>> > different JTs. E.g, the JobClient is in my PC and it try to submit the
>>> > same
>>> > Job  in 2 JTs at different sites in Amazon EC2. When I'm launching the
>>> > Job,
>>> > in the setup phase, the JobClient is trying to submit split file info
>>> > into
>>> > the remote JT.  This is the method of the JobClient that I've the
>>> > problem:
>>> >
>>> >
>>> >   public static void createSplitFiles(Path jobSubmitDir,
>>> >       Configuration conf, FileSystem   fs,
>>> >       org.apache.hadoop.mapred.InputSplit[] splits)
>>> >   throws IOException {
>>> >     FSDataOutputStream out = createFile(fs,
>>> >         JobSubmissionFiles.getJobSplitFile(jobSubmitDir), conf);
>>> >     SplitMetaInfo[] info = writeOldSplits(splits, out, conf);
>>> >     out.close();
>>> >
>>> >
>>> > writeJobSplitMetaInfo(fs,JobSubmissionFiles.getJobSplitMetaFile(jobSubmitDir),
>>> >         new FsPermission(JobSubmissionFiles.JOB_FILE_PERMISSION),
>>> > splitVersion,
>>> >         info);
>>> >   }
>>> >
>>> > 1 - The FSDataOutputStream hangs in the out.close() instruction. Why it
>>> > hangs? What should I do to solve this?
>>> >
>>> >
>>> > --
>>> > Best regards,
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>>
>> --
>> Best regards,
>
>
>
>
> --
> Best regards,



-- 
Harsh J

Re: FSDataOutputStream hangs in out.close()

Reply via email to