[jira] Commented: (MAPREDUCE-1788) o.a.h.mapreduce.Job shouldn't make a copy of the JobConf

2011-01-12 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981155#action_12981155
 ] 

Arun C Murthy commented on MAPREDUCE-1788:
--

Downgraded, I'm not sure this is critical right away. Also, I remember Owen 
having some reservation about this... foggy.

> o.a.h.mapreduce.Job shouldn't make a copy of the JobConf
> 
>
> Key: MAPREDUCE-1788
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1788
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.21.0
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> Having o.a.h.mapreduce.Job make a copy of the passed in JobConf has several 
> issues: any modifications done by various pieces such as InputSplit etc. are 
> not reflected back and causes issues for frameworks built on top.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1788) o.a.h.mapreduce.Job shouldn't make a copy of the JobConf

2011-01-12 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-1788:
-

Priority: Major  (was: Blocker)

> o.a.h.mapreduce.Job shouldn't make a copy of the JobConf
> 
>
> Key: MAPREDUCE-1788
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1788
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.21.0
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> Having o.a.h.mapreduce.Job make a copy of the passed in JobConf has several 
> issues: any modifications done by various pieces such as InputSplit etc. are 
> not reflected back and causes issues for frameworks built on top.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat

2011-01-12 Thread Ahmed Radwan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated MAPREDUCE-2254:


Attachment: (was: HADOOP-7096.patch)

> Allow setting of end-of-record delimiter for TextInputFormat
> 
>
> Key: MAPREDUCE-2254
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Radwan
> Attachments: MAPREDUCE-2245.patch
>
>
> It will be useful to allow setting the end-of-record delimiter for 
> TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as 
> the only possible record delimiters. This is a problem if users have embedded 
> newlines in their data fields (which is pretty common). This is also a 
> problem for other tools using this TextInputFormat (See for example: 
> https://issues.apache.org/jira/browse/PIG-836 and 
> https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to 
> specify any custom end-of-record delimiter using a new added configuration 
> property. For backward compatibility, if this new configuration property is 
> absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or 
> '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat

2011-01-12 Thread Ahmed Radwan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981114#action_12981114
 ] 

Ahmed Radwan commented on MAPREDUCE-2254:
-

I have attached the patches. The review board has a problem with uploading git 
diff files.

> Allow setting of end-of-record delimiter for TextInputFormat
> 
>
> Key: MAPREDUCE-2254
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Radwan
> Attachments: HADOOP-7096.patch, MAPREDUCE-2245.patch
>
>
> It will be useful to allow setting the end-of-record delimiter for 
> TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as 
> the only possible record delimiters. This is a problem if users have embedded 
> newlines in their data fields (which is pretty common). This is also a 
> problem for other tools using this TextInputFormat (See for example: 
> https://issues.apache.org/jira/browse/PIG-836 and 
> https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to 
> specify any custom end-of-record delimiter using a new added configuration 
> property. For backward compatibility, if this new configuration property is 
> absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or 
> '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat

2011-01-12 Thread Ahmed Radwan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated MAPREDUCE-2254:


Attachment: (was: 2.patch)

> Allow setting of end-of-record delimiter for TextInputFormat
> 
>
> Key: MAPREDUCE-2254
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Radwan
> Attachments: HADOOP-7096.patch, MAPREDUCE-2245.patch
>
>
> It will be useful to allow setting the end-of-record delimiter for 
> TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as 
> the only possible record delimiters. This is a problem if users have embedded 
> newlines in their data fields (which is pretty common). This is also a 
> problem for other tools using this TextInputFormat (See for example: 
> https://issues.apache.org/jira/browse/PIG-836 and 
> https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to 
> specify any custom end-of-record delimiter using a new added configuration 
> property. For backward compatibility, if this new configuration property is 
> absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or 
> '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat

2011-01-12 Thread Ahmed Radwan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated MAPREDUCE-2254:


Attachment: (was: 1.patch)

> Allow setting of end-of-record delimiter for TextInputFormat
> 
>
> Key: MAPREDUCE-2254
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Radwan
> Attachments: HADOOP-7096.patch, MAPREDUCE-2245.patch
>
>
> It will be useful to allow setting the end-of-record delimiter for 
> TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as 
> the only possible record delimiters. This is a problem if users have embedded 
> newlines in their data fields (which is pretty common). This is also a 
> problem for other tools using this TextInputFormat (See for example: 
> https://issues.apache.org/jira/browse/PIG-836 and 
> https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to 
> specify any custom end-of-record delimiter using a new added configuration 
> property. For backward compatibility, if this new configuration property is 
> absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or 
> '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat

2011-01-12 Thread Ahmed Radwan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated MAPREDUCE-2254:


Attachment: HADOOP-7096.patch
MAPREDUCE-2245.patch

The updated patches.
Added a new test case, and used --no-prefix when generating the git diff files.

> Allow setting of end-of-record delimiter for TextInputFormat
> 
>
> Key: MAPREDUCE-2254
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Radwan
> Attachments: HADOOP-7096.patch, MAPREDUCE-2245.patch
>
>
> It will be useful to allow setting the end-of-record delimiter for 
> TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as 
> the only possible record delimiters. This is a problem if users have embedded 
> newlines in their data fields (which is pretty common). This is also a 
> problem for other tools using this TextInputFormat (See for example: 
> https://issues.apache.org/jira/browse/PIG-836 and 
> https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to 
> specify any custom end-of-record delimiter using a new added configuration 
> property. For backward compatibility, if this new configuration property is 
> absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or 
> '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat

2011-01-12 Thread Ahmed Radwan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981099#action_12981099
 ] 

Ahmed Radwan commented on MAPREDUCE-2254:
-

Uploaded the patch to review board for easier review. 
https://reviews.apache.org/r/293/
https://reviews.apache.org/r/312/

I have also added a new test case to the patch.

> Allow setting of end-of-record delimiter for TextInputFormat
> 
>
> Key: MAPREDUCE-2254
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Radwan
> Attachments: 1.patch, 2.patch
>
>
> It will be useful to allow setting the end-of-record delimiter for 
> TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as 
> the only possible record delimiters. This is a problem if users have embedded 
> newlines in their data fields (which is pretty common). This is also a 
> problem for other tools using this TextInputFormat (See for example: 
> https://issues.apache.org/jira/browse/PIG-836 and 
> https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to 
> specify any custom end-of-record delimiter using a new added configuration 
> property. For backward compatibility, if this new configuration property is 
> absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or 
> '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.

2011-01-12 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2250:
---

Attachment: MAPREDUCE-2250.1.patch

Update after svn up

> Fix logging in raid code.
> -
>
> Key: MAPREDUCE-2250
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
>Priority: Trivial
> Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.patch
>
>
> There are quite a few error messages being logged with a log level of info. 
> That should be fixed to help debugging.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.

2011-01-12 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2250:
---

Hadoop Flags: [Reviewed]
  Status: Patch Available  (was: Open)

> Fix logging in raid code.
> -
>
> Key: MAPREDUCE-2250
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
>Priority: Trivial
> Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.patch
>
>
> There are quite a few error messages being logged with a log level of info. 
> That should be fixed to help debugging.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.

2011-01-12 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2250:
---

Status: Open  (was: Patch Available)

> Fix logging in raid code.
> -
>
> Key: MAPREDUCE-2250
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
>Priority: Trivial
> Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.patch
>
>
> There are quite a few error messages being logged with a log level of info. 
> That should be fixed to help debugging.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block

2011-01-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981065#action_12981065
 ] 

Hudson commented on MAPREDUCE-2248:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #577 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/577/])
MAPREDUCE-2248. DistributedRaidFileSystem should unraid only the corrupt 
block
(Ramkumar Vadali via schen)


> DistributedRaidFileSystem should unraid only the corrupt block
> --
>
> Key: MAPREDUCE-2248
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2248.1.patch, MAPREDUCE-2248.patch
>
>
> DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. 
> It is better to unraid just the corrupt block and use the rest of the file as 
> normal. This becomes really important when we have tera-byte sized files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2260) Remove auto-generated native build files

2011-01-12 Thread Roman Shaposhnik (JIRA)
Remove auto-generated native build files


 Key: MAPREDUCE-2260
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2260
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Reporter: Roman Shaposhnik


The repo currently includes the automake and autoconf generated files for the 
native build. Per discussion on HADOOP-6421 let's remove them and use the 
host's automake and autoconf. We should also do this for libhdfs and fuse-dfs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2259) Hadoop Streaming JAR location might be updated

2011-01-12 Thread minoru nishikubo (JIRA)
Hadoop Streaming JAR location might be updated
--

 Key: MAPREDUCE-2259
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2259
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation
 Environment: N/A
Reporter: minoru nishikubo
Priority: Trivial


examples in docs/streaming.html:
$HADOOP_HOME/hadoop-streaming.jar
might be updated to
$HADOOP_HOME/contrib/streaming/hadoop-$HADOOP-VERSION-streaming.jar
for someone could not find the streaming archive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block

2011-01-12 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-2248:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thanks Ram.

> DistributedRaidFileSystem should unraid only the corrupt block
> --
>
> Key: MAPREDUCE-2248
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2248.1.patch, MAPREDUCE-2248.patch
>
>
> DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. 
> It is better to unraid just the corrupt block and use the rest of the file as 
> normal. This becomes really important when we have tera-byte sized files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block

2011-01-12 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980951#action_12980951
 ] 

Ramkumar Vadali commented on MAPREDUCE-2248:


TEST RESULTS
{code}
test-junit:
[junit] WARNING: multiple versions of ant detected in path for junit
[junit]  
jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.hadoop.hdfs.TestRaidDfs
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 524.787 sec
[junit] Running 
org.apache.hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRaid
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 154.653 sec
[junit] Running org.apache.hadoop.raid.TestBlockFixer
[junit] Tests run: 14, Failures: 0, Errors: 0, Time elapsed: 944.872 sec
[junit] Running org.apache.hadoop.raid.TestDirectoryTraversal
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 13.241 sec
[junit] Running org.apache.hadoop.raid.TestErasureCodes
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 17.78 sec
[junit] Running org.apache.hadoop.raid.TestGaloisField
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.293 sec
[junit] Running org.apache.hadoop.raid.TestHarIndexParser
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.036 sec
[junit] Running org.apache.hadoop.raid.TestRaidFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 15.007 sec
[junit] Running org.apache.hadoop.raid.TestRaidHar
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 178.351 sec
[junit] Running org.apache.hadoop.raid.TestRaidNode
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 646.931 sec
[junit] Running org.apache.hadoop.raid.TestRaidPurge
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 253.727 sec
[junit] Running org.apache.hadoop.raid.TestRaidShell
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 21.994 sec
[junit] Running org.apache.hadoop.raid.TestRaidShellFsck
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 270.783 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonDecoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.14 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonEncoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.769 sec
{code}

{code}

 [exec] 
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 4 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
{code}


> DistributedRaidFileSystem should unraid only the corrupt block
> --
>
> Key: MAPREDUCE-2248
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2248.1.patch, MAPREDUCE-2248.patch
>
>
> DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. 
> It is better to unraid just the corrupt block and use the rest of the file as 
> normal. This becomes really important when we have tera-byte sized files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block

2011-01-12 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-2248:
--

Fix Version/s: 0.23.0
Affects Version/s: 0.23.0
 Hadoop Flags: [Reviewed]
   Status: Patch Available  (was: Open)

> DistributedRaidFileSystem should unraid only the corrupt block
> --
>
> Key: MAPREDUCE-2248
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2248.1.patch, MAPREDUCE-2248.patch
>
>
> DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. 
> It is better to unraid just the corrupt block and use the rest of the file as 
> normal. This becomes really important when we have tera-byte sized files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1478) Separate the mapred.lib and mapreduce.lib classes to a different jar and include the user jar ahead of the lib jar.

2011-01-12 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1478:
-

Attachment: MAPREDUCE-1478.patch
move.sh

Here's an initial first cut at implementing this. The part for including the 
user's classpath first is covered in MAPREDUCE-1938. This patch creates a new 
source tree under src/lib for the MapReduce libraries, and the build creates a 
separate library jar. It compiles, but I haven't done any testing yet. There 
are a number of changes that are needed to remove core's dependency on the 
libraries. Most are small, like removing dependencies on constants or 
configuration names, but here are some of the other ones I made:

* Introduce InputSplitCallback so MapTask doesn't have a dependency on 
FileSplit. Make mapred.FileSplit implement this interface so it can modify 
JobConf before the mapper is run.
* Task depends on FileOutputCommitter. Push this code down into 
FileOutputCommitter implementations of OutputCommitter#setupTask.
* MapTask, Task depend on the public WrappedMapper, WrappedReducer classes. 
These need to be constructed reflectively or have private duplicates made (I 
did the latter in this patch).
* org.apache.hadoop.mapreduce.util.ConfigUtil reflectively calls the new 
org.apache.hadoop.mapreduce.lib.ConfigUtil so that the deprecated keys for the 
libraries are added.

You need to run the move.sh script before applying the patch.


> Separate the mapred.lib and mapreduce.lib classes to a different jar and 
> include the user jar ahead of the lib jar.
> ---
>
> Key: MAPREDUCE-1478
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1478
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Reporter: Owen O'Malley
> Attachments: MAPREDUCE-1478.patch, move.sh
>
>
> Currently the user can't include updated library jars as part of their job. 
> By pulling out the lib classes we can include the classes (eg. 
> TextInputFormat) in the user's jar and get their version and not the system 
> installed one.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

2011-01-12 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980885#action_12980885
 ] 

Todd Lipcon commented on MAPREDUCE-2258:


Hong: I agree LzoCodec is preferable to LzopCodec for use in intermediate 
compression, but I think the bug you referenced is no longer the case. 
LzopCodecs can now be pooled properly with Chris's patch you referenced plus 
changes on the lzo side.

Just to clarify, you agree this code in IFile is wrong and should be fixed, 
right?

> IFile reader closes stream and compressor in wrong order
> 
>
> Key: MAPREDUCE-2258
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.20.4, 0.22.0
>Reporter: Todd Lipcon
> Fix For: 0.22.0
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call 
> close() on the input stream. This is backwards and causes a rare race in the 
> case of LzopCodec, since LzopInputStream makes a few calls on the 
> decompressor object inside close(). If another thread pulls the decompressor 
> out of the pool and starts to use it in the meantime, the first thread's 
> close() will cause the second thread to potentially miss pieces of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

2011-01-12 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980882#action_12980882
 ] 

Hong Tang commented on MAPREDUCE-2258:
--

Such pattern would in general affect all CompressionCodec's and is similar to a 
bug I filed earlier: HADOOP-4195.

On the other hand, as explained by Chris D in HADOOP-4162, LzopCodec cannot be 
safely reused in Hadoop, and thus the problem you described actually should not 
happen. In fact, repeatedly getting LzopCodec from CodecPool is likely get you 
into OOM.

> IFile reader closes stream and compressor in wrong order
> 
>
> Key: MAPREDUCE-2258
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.20.4, 0.22.0
>Reporter: Todd Lipcon
> Fix For: 0.22.0
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call 
> close() on the input stream. This is backwards and causes a rare race in the 
> case of LzopCodec, since LzopInputStream makes a few calls on the 
> decompressor object inside close(). If another thread pulls the decompressor 
> out of the pool and starts to use it in the meantime, the first thread's 
> close() will cause the second thread to potentially miss pieces of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

2011-01-12 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980871#action_12980871
 ] 

Todd Lipcon commented on MAPREDUCE-2258:


The following is a unit test I wrote on the hadoop-lzo side that does the same 
behavior as IFile.Reader.close():

https://github.com/toddlipcon/hadoop-lzo/commit/a5af3b93f52f55828dfc05e7503d38383eec9dc5

It fails reliably since some threads only manage to read part of the data in 
the file.

> IFile reader closes stream and compressor in wrong order
> 
>
> Key: MAPREDUCE-2258
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.20.4, 0.22.0
>Reporter: Todd Lipcon
> Fix For: 0.22.0
>
>
> In IFile.Reader.close(), we return the decompressor to the pool and then call 
> close() on the input stream. This is backwards and causes a rare race in the 
> case of LzopCodec, since LzopInputStream makes a few calls on the 
> decompressor object inside close(). If another thread pulls the decompressor 
> out of the pool and starts to use it in the meantime, the first thread's 
> close() will cause the second thread to potentially miss pieces of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order

2011-01-12 Thread Todd Lipcon (JIRA)
IFile reader closes stream and compressor in wrong order


 Key: MAPREDUCE-2258
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 0.20.4, 0.22.0
Reporter: Todd Lipcon
 Fix For: 0.22.0


In IFile.Reader.close(), we return the decompressor to the pool and then call 
close() on the input stream. This is backwards and causes a rare race in the 
case of LzopCodec, since LzopInputStream makes a few calls on the decompressor 
object inside close(). If another thread pulls the decompressor out of the pool 
and starts to use it in the meantime, the first thread's close() will cause the 
second thread to potentially miss pieces of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.