[jira] [Commented] (HADOOP-9184) Some reducers failing to write final output file to s3.

2013-07-28 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13722150#comment-13722150
 ] 

Joydeep Sen Sarma commented on HADOOP-9184:
---

unfortunately the patch doesn't fix this issue completely. as mentioned above - 
the initial listing of the task output directory may lie - and we are seeing 
this in testing. 

we suspect this line of attack is ultimately futile. It's hard for 
FileOutputCommitter to know exactly how many files have been produced by the 
recordwriter. Most likely - writing directly to final output location in S3 
(and exploiting the atomicity of the S3 'put' operation) and just avoiding the 
rename operation is the only long term fix.

> Some reducers failing to write final output file to s3.
> ---
>
> Key: HADOOP-9184
> URL: https://issues.apache.org/jira/browse/HADOOP-9184
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 0.20.2
>Reporter: Jeremy Karn
> Attachments: example.pig, HADOOP-9184-branch-0.20.patch, 
> hadoop-9184.patch, task_log.txt
>
>
> We had a Hadoop job that was running 100 reducers with most of the reducers 
> expected to write out an empty file. When the final output was to an S3 
> bucket we were finding that sometimes we were missing a final part file.  
> This was happening approximately 1 job in 3 (so approximately 1 reducer out 
> of 300 was failing to output the data properly). I've attached the pig script 
> we were using to reproduce the bug.
> After an in depth look and instrumenting the code we traced the problem to 
> moveTaskOutputs in FileOutputCommitter.  
> The code there looked like:
> {code}
> if (fs.isFile(taskOutput)) {
>   … do stuff …   
> } else if(fs.getFileStatus(taskOutput).isDir()) {
>   … do stuff … 
> }
> {code}
> And what we saw happening is that for the problem jobs neither path was being 
> exercised.  I've attached the task log of our instrumented code.  In this 
> version we added an else statement and printed out the line "THIS SEEMS LIKE 
> WE SHOULD NEVER GET HERE …".
> The root cause of this seems to be an eventual consistency issue with S3.  
> You can see in the log that the first time moveTaskOutputs is called it finds 
> that the taskOutput is a directory.  It goes into the isDir() branch and 
> successfully retrieves the list of files in that directory from S3 (in this 
> case just one file).  This triggers a recursive call to moveTaskOutputs for 
> the file found in the directory.  But in this pass through moveTaskOutput the 
> temporary output file can't be found resulting in both branches of the above 
> if statement being skipped and the temporary file never being moved to the 
> final output location.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9184) Some reducers failing to write final output file to s3.

2013-07-27 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721655#comment-13721655
 ] 

Joydeep Sen Sarma commented on HADOOP-9184:
---

One question - what the patch seems to do is assert that if a path was listed 
initially (in the parent moveTaskOutputs() call) - then it must be listable in 
the child call (if it's neither a Dir nor a File - then basically it's not 
listable). One thing I don't understand is what is the guarantee that the 
FileSystem will accurately list the child paths in the parent function call? 
(assuming this happens because of eventual consistency issues with S3)

On a different note - changing the moveTaskOutputs signature to carry a 
FileStatus (of the output path) instead of the Path argument removes the need 
to throw an exception (since the parent call just passes the status down to the 
child - there can never be any disagreement between the two)

> Some reducers failing to write final output file to s3.
> ---
>
> Key: HADOOP-9184
> URL: https://issues.apache.org/jira/browse/HADOOP-9184
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 0.20.2
>Reporter: Jeremy Karn
> Attachments: example.pig, HADOOP-9184-branch-0.20.patch, 
> hadoop-9184.patch, task_log.txt
>
>
> We had a Hadoop job that was running 100 reducers with most of the reducers 
> expected to write out an empty file. When the final output was to an S3 
> bucket we were finding that sometimes we were missing a final part file.  
> This was happening approximately 1 job in 3 (so approximately 1 reducer out 
> of 300 was failing to output the data properly). I've attached the pig script 
> we were using to reproduce the bug.
> After an in depth look and instrumenting the code we traced the problem to 
> moveTaskOutputs in FileOutputCommitter.  
> The code there looked like:
> {code}
> if (fs.isFile(taskOutput)) {
>   … do stuff …   
> } else if(fs.getFileStatus(taskOutput).isDir()) {
>   … do stuff … 
> }
> {code}
> And what we saw happening is that for the problem jobs neither path was being 
> exercised.  I've attached the task log of our instrumented code.  In this 
> version we added an else statement and printed out the line "THIS SEEMS LIKE 
> WE SHOULD NEVER GET HERE …".
> The root cause of this seems to be an eventual consistency issue with S3.  
> You can see in the log that the first time moveTaskOutputs is called it finds 
> that the taskOutput is a directory.  It goes into the isDir() branch and 
> successfully retrieves the list of files in that directory from S3 (in this 
> case just one file).  This triggers a recursive call to moveTaskOutputs for 
> the file found in the directory.  But in this pass through moveTaskOutput the 
> temporary output file can't be found resulting in both branches of the above 
> if statement being skipped and the temporary file never being moved to the 
> final output location.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-6837) Support for LZMA compression

2013-06-18 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687660#comment-13687660
 ] 

Joydeep Sen Sarma commented on HADOOP-6837:
---

yes - the fb-hadoop tree has a working implementation. most of the original 
code came from Baidu.

we tried to convert many petabytes to lzma. (switching from gzip compressed 
rcfile to lzma compressed). aside from speed issues (writes are very slow in 
spite of trying our best to fiddle around with different lzma settings directly 
in code) - the problem is we got rare corruptions every once in a while. these 
didn't seem to have anything to do with hadoop code - but the lzma codec 
itself. certain blocks would be unreadable. we had to abandon the conversion 
project at that point.

my gut is that for small scale uses - the lzma stuff as implemented in 
fb-hadoop-20 works.

across petabytes of data - where every rcfile block (1MB) has multiple 
compressed streams (1 per column) - and we are literally opening and closing 
billions of compressed streams - there are latent bugs in lzma (that were well 
beyond our capability to debug - leave alone reproduce accurately).

we never had the same issues with gzip obviously (so the problem cannot be 
hadoop components like HDFS).

> Support for LZMA compression
> 
>
> Key: HADOOP-6837
> URL: https://issues.apache.org/jira/browse/HADOOP-6837
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: io
>Reporter: Nicholas Carlini
>Assignee: Nicholas Carlini
> Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, 
> HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, 
> HADOOP-6837-lzma-3-20100809.patch, HADOOP-6837-lzma-4-20100811.patch, 
> HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch
>
>
> Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which 
> generally achieves higher compression ratios than both gzip and bzip2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-7020) establish a "Powered by Hadoop" logo

2010-11-05 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928938#action_12928938
 ] 

Joydeep Sen Sarma commented on HADOOP-7020:
---

powered by hadoop would be well served by a logo of an elephant carrying or 
pulling something. just an idea.

> establish a "Powered by Hadoop" logo
> 
>
> Key: HADOOP-7020
> URL: https://issues.apache.org/jira/browse/HADOOP-7020
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: site
>Reporter: Doug Cutting
>Assignee: Doug Cutting
> Fix For: site
>
> Attachments: powered-by-hadoop-small.png, powered-by-hadoop.png
>
>
> We should agree on a Powered By Hadoop logo, as suggested in:
> http://www.apache.org/foundation/marks/pmcs#poweredby

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6837) Support for LZMA compression

2010-09-28 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915938#action_12915938
 ] 

Joydeep Sen Sarma commented on HADOOP-6837:
---

thanks to everyone on getting lzma into hadoop. it seems to be very promising.

i have tried applying the latest patch to both hadoop-0.20 (yahoo/facebook 
branch) and common- trunk. in both cases - when i try running TestCodec after 
compiling the native codec - i get a sigsegv:

[junit] Running org.apache.hadoop.io.compress.TestCodec
[junit] #
[junit] # An unexpected error has been detected by Java Runtime Environment:
[junit] #
[junit] #  SIGSEGV (0xb) at pc=0x2aaad5215659, pid=16028, tid=1076017472
[junit] #
[junit] # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b23 mixed mode 
linux-amd64)
[junit] # Problematic frame:
[junit] # C  [libhadoop.so.1.0.0+0x5659]  thisRead+0x49
[junit] #

separate from this - i had a question about tuning the compression level. in my 
testing on internal data using the lzma utility built from the SDK - i found a 
bunch of interesting option that provided a more suitable compromise between 
compression ratio/cpu (-a0 -mfhc4 -d24 -fbxxx) than the default. eyeing the 
'level' based normalization - it seems i won't be able to quite achieve the 
settings i want by specifying a level. so it seems that being able to configure 
these options separately would be very useful.

> Support for LZMA compression
> 
>
> Key: HADOOP-6837
> URL: https://issues.apache.org/jira/browse/HADOOP-6837
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: io
>Reporter: Nicholas Carlini
>Assignee: Nicholas Carlini
> Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, 
> HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, 
> HADOOP-6837-lzma-3-20100809.patch, HADOOP-6837-lzma-4-20100811.patch, 
> HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch
>
>
> Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which 
> generally achieves higher compression ratios than both gzip and bzip2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.