[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777832#action_12777832
 ] 

Hadoop QA commented on MAPREDUCE-1176:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12424931/MAPREDUCE-1176-v2.patch
  against trunk revision 836063.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/244/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/244/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/244/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/244/console

This message is automatically generated.

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1, 0.20.2
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-1176:
---

Status: Open  (was: Patch Available)

toggling patch status (this kicks the qa bot to rerun the checks)

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1, 0.20.2
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-1176:
---

Status: Patch Available  (was: Open)

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1, 0.20.2
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: MAPREDUCE-1176-v2.patch

Updated version of patch

- Moved from 4 spaces to 2 spaces
- Fixed long lines
- Getting rid of java file attachments
- removed 0.20.1 comment and other old comments

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1, 0.20.2
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: (was: FixedLengthRecordReader.java)

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1, 0.20.2
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: (was: FixedLengthInputFormat.java)

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1, 0.20.2
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

2009-11-13 Thread Jothi Padmanabhan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777809#action_12777809
 ] 

Jothi Padmanabhan commented on MAPREDUCE-1140:
--

In {{getLocalCache}}, if the file is already present, only reference count is 
incremented, so does not throw any IOExceptions. If the file is not present, 
{{cachedArchives.put}} is done only after the necessary processing (read 
methods that can throw IOExceptions) is complete. So, I do not think this patch 
will lead to any inconsistency.

I however do agree that it is a good idea to do the refactor of test case to 
avoid duplication.

> Per cache-file refcount can become negative when tasks release 
> distributed-cache files
> --
>
> Key: MAPREDUCE-1140
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.2, 0.21.0, 0.22.0
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1140-1.txt, patch-1140-ydist.txt, patch-1140.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777802#action_12777802
 ] 

BitsOfInfo commented on MAPREDUCE-1176:
---

I followed the instructions listed @ 
http://wiki.apache.org/hadoop/HowToContribute  

"Finally, patches should be attached to an issue report in Jira via the Attach 
File link on the issue's Jira. When you believe that your patch is ready to be 
committed, select the Submit Patch link on the issue's Jira. "

So are you saying to delete the 2 *.java files and only upload the .patch?

The *.patch file does contain a unit test in it so I am not sure why the 
comment above reported no tests were included. I ran this patch file against a 
clean trunk copy locally on my test machine and also verified it was ok through 
the "test-patch" task on the contribute how-to page.

I'll remove the 4 space indents, when looking through other sources I found the 
2 spaces and tons of wrapping unreadable.

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1, 0.20.2
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: FixedLengthInputFormat.java, 
> FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1298#action_1298
 ] 

Todd Lipcon commented on MAPREDUCE-1176:


Hi,

- Please *only* upload the patch to Hudson. Otherwise the QA bot gets confused 
and tries to apply your .java files as a patch.
- Also, the coding style guidelines for Hadoop are have an indentation level of 
2 spaces. It looks like your patch is full of tabs. There are a few other style 
violations. The coding style is http://java.sun.com/docs/codeconv/ with the 
change of 2 spaces instead of 4. It's probably easier to look through other 
parts of the Hadoop codebase and simply follow their example.
- There's a comment referring to the 0.20.1 code. Since this patch is slated 
for trunk, not 0.20.1, please remove that.
- There are some other bits of commented-out code. These are a no-no - either 
the code works and is important, in which case it should be there, or it's not 
important (or broken) and it shouldn't.

Thanks again for contributing to Hadoop! The review process can take a while 
but it's important to maintain style consistency across the codebase.



> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1, 0.20.2
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: FixedLengthInputFormat.java, 
> FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1295#action_1295
 ] 

Hadoop QA commented on MAPREDUCE-1176:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12424924/FixedLengthRecordReader.java
  against trunk revision 836063.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/243/console

This message is automatically generated.

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1, 0.20.2
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: FixedLengthInputFormat.java, 
> FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Affects Version/s: 0.20.2
 Release Note: 
Addition of FixedLengthInputFormat and FixedLengthRecordReader in the 
org.apache.hadoop.mapreduce.lib.input package. These two classes can be used 
when you need to read data from files containing fixed length (fixed width) 
records. Such files have no CR/LF (or any combination thereof), no delimiters 
etc, but each record is a fixed length, and extra data is padded with spaces. 
The data is one gigantic line within a file. When creating a job that specifies 
this input format, the job must have the 
"mapreduce.input.fixedlengthinputformat.record.length" property set as follows 
myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
 

Please see javadoc for more details.
   Status: Patch Available  (was: Open)

Attached is a patch file which adds FixedLengthInputFormat, 
FixedLengthRecordReader and a unit test for the new input format. This patch 
was made against the trunk as of 11/13.

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1, 0.20.2
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: FixedLengthInputFormat.java, 
> FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: FixedLengthInputFormat.java

Re-attached FixedLengthInputFormat source file, to contain same version as in 
patch file

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: FixedLengthInputFormat.java, 
> FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: (was: FixedLengthRecordReader.java)

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: FixedLengthInputFormat.java, 
> FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: FixedLengthRecordReader.java

Re-attached source to match the same as in patch file

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: FixedLengthInputFormat.java, 
> FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: (was: FixedLengthInputFormat.java)

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-13 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: MAPREDUCE-1176-v1.patch

Attached is a patch file which adds FixedLengthInputFormat, 
FixedLengthRecordReader and a unit test for the new input format. This patch 
was made against the trunk as of 11/13.

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1232#action_1232
 ] 

Hadoop QA commented on MAPREDUCE-967:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424909/mapreduce-967.txt
  against trunk revision 835968.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/139/testReport/
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/139/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/139/console

This message is automatically generated.

> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
> mapreduce-967.txt, mapreduce-967.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)

2009-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1229#action_1229
 ] 

Hadoop QA commented on MAPREDUCE-961:
-

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12424888/MAPREDUCE-961-v4.patch
  against trunk revision 835968.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/242/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/242/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/242/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/242/console

This message is automatically generated.

> ResourceAwareLoadManager to dynamically decide new tasks based on current 
> CPU/memory load on TaskTracker(s)
> ---
>
> Key: MAPREDUCE-961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-961
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, 
> MAPREDUCE-961-v3.patch, MAPREDUCE-961-v4.patch
>
>
> Design and develop a ResouceAwareLoadManager for the FairShare scheduler that 
> dynamically decides how many maps/reduces to run on a particular machine 
> based on the CPU/Memory/diskIO/network usage in that machine.  The amount of 
> resources currently used on each task tracker is being fed into the 
> ResourceAwareLoadManager in real-time via an entity that is external to 
> Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-11-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1223#action_1223
 ] 

Todd Lipcon commented on MAPREDUCE-967:
---

Forgot to note - this will probably fail Hudson since it requires the common 
jar built from HADOOP-6346 to add new functions to support regexes in 
Configuration.

> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
> mapreduce-967.txt, mapreduce-967.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-11-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-967:
--

Attachment: mapreduce-967.txt

New patch for the new version of HADOOP-6346. This one does *not* move the 
RunJar class to mapreduce, since we determined over in that issue that it isn't 
the best course of action.

One question for reviewer: the constant for the new configuration key is in 
JobContext, whereas the default is in JobConf. I was following some other 
examples from the code, but it seems a little bit messy here. Where are the 
right places to add new configuration parameters that work in both APIs?

> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
> mapreduce-967.txt, mapreduce-967.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-11-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-967:
--

Status: Patch Available  (was: Open)

> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
> mapreduce-967.txt, mapreduce-967.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-11-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-967:
--

Status: Open  (was: Patch Available)

> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
> mapreduce-967.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1214) Add support for counters in Hadoop Local Mode

2009-11-13 Thread Ankit Modi (JIRA)
Add support for counters in Hadoop Local Mode
-

 Key: MAPREDUCE-1214
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1214
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ankit Modi


Currently there is no support for counters ( Records and Bytes written ) in 
Hadoop Local Mode.

Pig requires to provide counters to user when running in Hadoop Local Mode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)

2009-11-13 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777690#action_12777690
 ] 

Scott Chen commented on MAPREDUCE-961:
--

I will write an overall design document and post it here soon.

> ResourceAwareLoadManager to dynamically decide new tasks based on current 
> CPU/memory load on TaskTracker(s)
> ---
>
> Key: MAPREDUCE-961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-961
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, 
> MAPREDUCE-961-v3.patch, MAPREDUCE-961-v4.patch
>
>
> Design and develop a ResouceAwareLoadManager for the FairShare scheduler that 
> dynamically decides how many maps/reduces to run on a particular machine 
> based on the CPU/Memory/diskIO/network usage in that machine.  The amount of 
> resources currently used on each task tracker is being fed into the 
> ResourceAwareLoadManager in real-time via an entity that is external to 
> Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)

2009-11-13 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777644#action_12777644
 ] 

Scott Chen commented on MAPREDUCE-961:
--

Thanks for the comment, Arun. I have changed the patch a lot following the 
suggestion froms Matei and Vinod. The last patch is total different from the 
first one. I am sorry about the confusion.

The following is the design
1. We obtain the available memory on the TT using MemoryCalculatorPlugin. 
Originally this class calculates only total memory only, we add a slight change 
so that it also computes the available memory.
2. The information is reported with TaskTrackerStatus.ResourceStatus back to JT.
3. In MemBasedLoadManager, we look at the available memory on TT, the maximum 
memory per task (from jobConf) and a configured reserved memory on TT. If 
(available memory - task memory > reserved memory), we return true which allows 
scheduler to lauch the task.

The initial idea also includes using the memory usage of a job collecting in 
the cluster. Right now we only use the value obtained in jobConf. After 
MAPREDUCE-220 is done, we can use the task memory estimated by the previous 
tasks.

> ResourceAwareLoadManager to dynamically decide new tasks based on current 
> CPU/memory load on TaskTracker(s)
> ---
>
> Key: MAPREDUCE-961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-961
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, 
> MAPREDUCE-961-v3.patch, MAPREDUCE-961-v4.patch
>
>
> Design and develop a ResouceAwareLoadManager for the FairShare scheduler that 
> dynamically decides how many maps/reduces to run on a particular machine 
> based on the CPU/Memory/diskIO/network usage in that machine.  The amount of 
> resources currently used on each task tracker is being fed into the 
> ResourceAwareLoadManager in real-time via an entity that is external to 
> Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)

2009-11-13 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-961:
-

Status: Patch Available  (was: Open)

> ResourceAwareLoadManager to dynamically decide new tasks based on current 
> CPU/memory load on TaskTracker(s)
> ---
>
> Key: MAPREDUCE-961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-961
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, 
> MAPREDUCE-961-v3.patch, MAPREDUCE-961-v4.patch
>
>
> Design and develop a ResouceAwareLoadManager for the FairShare scheduler that 
> dynamically decides how many maps/reduces to run on a particular machine 
> based on the CPU/Memory/diskIO/network usage in that machine.  The amount of 
> resources currently used on each task tracker is being fed into the 
> ResourceAwareLoadManager in real-time via an entity that is external to 
> Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1189) Reduce ivy console output to ovservable level

2009-11-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777625#action_12777625
 ] 

Hudson commented on MAPREDUCE-1189:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #117 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/117/])
. Reduce ivy console output to ovservable level. Contributed by Konstantin 
Boudnik


> Reduce ivy console output to ovservable level
> -
>
> Key: MAPREDUCE-1189
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1189
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.22.0
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1189.patch
>
>
> It is very hard to see what's going in the build because ivy is literally 
> flood the console with nonsensical messages...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)

2009-11-13 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-961:
-

Attachment: MAPREDUCE-961-v4.patch

> ResourceAwareLoadManager to dynamically decide new tasks based on current 
> CPU/memory load on TaskTracker(s)
> ---
>
> Key: MAPREDUCE-961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-961
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, 
> MAPREDUCE-961-v3.patch, MAPREDUCE-961-v4.patch
>
>
> Design and develop a ResouceAwareLoadManager for the FairShare scheduler that 
> dynamically decides how many maps/reduces to run on a particular machine 
> based on the CPU/Memory/diskIO/network usage in that machine.  The amount of 
> resources currently used on each task tracker is being fed into the 
> ResourceAwareLoadManager in real-time via an entity that is external to 
> Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1189) Reduce ivy console output to ovservable level

2009-11-13 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-1189:
--

   Resolution: Fixed
Fix Version/s: 0.22.0
   Status: Resolved  (was: Patch Available)

I've just committed this.

> Reduce ivy console output to ovservable level
> -
>
> Key: MAPREDUCE-1189
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1189
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.22.0
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1189.patch
>
>
> It is very hard to see what's going in the build because ivy is literally 
> flood the console with nonsensical messages...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)

2009-11-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777620#action_12777620
 ] 

Arun C Murthy commented on MAPREDUCE-961:
-

I'm looking at all the comments here and I cannot find a single, coherent, 
design for this feature - a major one. Can you please put up a design?

I'd first like to understand/debate the design before I look at the patch.


> ResourceAwareLoadManager to dynamically decide new tasks based on current 
> CPU/memory load on TaskTracker(s)
> ---
>
> Key: MAPREDUCE-961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-961
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, 
> MAPREDUCE-961-v3.patch
>
>
> Design and develop a ResouceAwareLoadManager for the FairShare scheduler that 
> dynamically decides how many maps/reduces to run on a particular machine 
> based on the CPU/Memory/diskIO/network usage in that machine.  The amount of 
> resources currently used on each task tracker is being fed into the 
> ResourceAwareLoadManager in real-time via an entity that is external to 
> Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)

2009-11-13 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-961:
-

Attachment: MAPREDUCE-961-v4.patch

Fixed the failed test in TestTTMemoryReporting. This test will run on linux 
only. Previously I was running test on mac so I did not catch the this failure.

> ResourceAwareLoadManager to dynamically decide new tasks based on current 
> CPU/memory load on TaskTracker(s)
> ---
>
> Key: MAPREDUCE-961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-961
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, 
> MAPREDUCE-961-v3.patch
>
>
> Design and develop a ResouceAwareLoadManager for the FairShare scheduler that 
> dynamically decides how many maps/reduces to run on a particular machine 
> based on the CPU/Memory/diskIO/network usage in that machine.  The amount of 
> resources currently used on each task tracker is being fed into the 
> ResourceAwareLoadManager in real-time via an entity that is external to 
> Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)

2009-11-13 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-961:
-

Attachment: (was: MAPREDUCE-961-v4.patch)

> ResourceAwareLoadManager to dynamically decide new tasks based on current 
> CPU/memory load on TaskTracker(s)
> ---
>
> Key: MAPREDUCE-961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-961
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, 
> MAPREDUCE-961-v3.patch
>
>
> Design and develop a ResouceAwareLoadManager for the FairShare scheduler that 
> dynamically decides how many maps/reduces to run on a particular machine 
> based on the CPU/Memory/diskIO/network usage in that machine.  The amount of 
> resources currently used on each task tracker is being fed into the 
> ResourceAwareLoadManager in real-time via an entity that is external to 
> Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1203) DBOutputFormat: add batch size support for JDBC and recieve DBWritable object in value not in key

2009-11-13 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777516#action_12777516
 ] 

Enis Soztutar commented on MAPREDUCE-1203:
--

Yes, unfortunatelly, JDBC does not specify the behavior for this. As far as I 
remember, execute batch returns the number of documents that it is able to 
send. 

Another use case, that I want at some point is to enable processing even after 
some records fail. Logically, failing to send a few records should be 
acceptable. Considering this use case and yours implementing different fail 
strategies makes sense. I guess we could first try execute batch, then fall 
back to execute update one by one. 

Alternatively, we could write a wrapper Input format, which limits the number 
of records in the map tasks so that there is less to send at each task. 

> DBOutputFormat: add batch size support for JDBC and recieve  DBWritable 
> object in value not in key
> --
>
> Key: MAPREDUCE-1203
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1203
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Alexander Schwid
>Assignee: Aaron Kimball
>Priority: Minor
> Attachments: HADOOP-4331.patch, patch.txt
>
>
> package mapred.lib.db
> added batch size support for JDBC in DBOutputFormat 
> recieve  DBWritable object in value not in key in DBOutputFormat

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

2009-11-13 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777476#action_12777476
 ] 

Hemanth Yamijala commented on MAPREDUCE-1140:
-

Some comments on the test case:

- Can we refactor TestTrackerDistributedCacheManager and its subclass for 
LinuxTaskController, by introducing an API like canRun in the base class. Let 
it return true and this can be overridden in 
TestTrackerDistributedCacheManagerWithLinuxTaskController to return false if 
the platform is not Linux. This way, I suppose we don't have to keep 
overridding every testcase in the base class everytime we make a change.

- I would enhance testReferenceCount to add 3 files instead of 2, the first 
file should be localized correctly, the second should fail after the reference 
count is incremented, and the third will not be localized at all. Then we can 
verify that the reference count for all files is 0.

 

> Per cache-file refcount can become negative when tasks release 
> distributed-cache files
> --
>
> Key: MAPREDUCE-1140
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.2, 0.21.0, 0.22.0
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1140-1.txt, patch-1140-ydist.txt, patch-1140.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

2009-11-13 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777472#action_12777472
 ] 

Hemanth Yamijala commented on MAPREDUCE-1140:
-

bq. This is done, because getLocalCache increments referenceCount first and 
then localizes. Reference count should be decremented for the one just failed 
also. So, it should be added to the list before the getLocalCache call.

Umm. But (atleast theoretically), it is still possible that a call to 
getLocalCache fails before referenceCount is incremented. For e.g. makeRelative 
throws IOException; so does getLocalCacheForWrite. Hence, we still have a 
situation where we record a file as being localized (by storing it in 
localizedCacheFiles), but the reference count is not actually incremented. And 
releaseCache would have the bug this JIRA is talking about still.

One more point I am slightly uncomfortable about is the duplication of state 
because of the new list localizedCacheFiles. 

Here's an alternate proposal:

- Modify CacheFile to have a boolean saying isLocalized. By default, this is 
false. This will be set to true if distributedCacheManager.getLocalCache 
returns successfully.
- To handle the case you have mentioned above, where a failure can happen after 
referenceCount is incremented in getLocalCache, I would suggest we catch 
exceptions inside getLocalCache, and on an exception, decrement the 
referenceCount and re-throw the exception. This seems right to me - because if 
the getLocalCache doesn't complete, shouldn't we be consistent by decrementing 
the reference count ?

Would this work ?

> Per cache-file refcount can become negative when tasks release 
> distributed-cache files
> --
>
> Key: MAPREDUCE-1140
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.2, 0.21.0, 0.22.0
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1140-1.txt, patch-1140-ydist.txt, patch-1140.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1213) TaskTrackers restart is very slow because ti deletes distributed cache directory synchronously

2009-11-13 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777470#action_12777470
 ] 

dhruba borthakur commented on MAPREDUCE-1213:
-

One option is to rename the currentworkdir to a temporary location and then 
delete these file asynchronously via the technique used in HDFS-611

> TaskTrackers restart is very slow because ti deletes distributed cache 
> directory synchronously
> --
>
> Key: MAPREDUCE-1213
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.1
>Reporter: dhruba borthakur
>
> We are seeing that when we restart a tasktracker, it tries to recursively 
> delete all the file in the distributed cache. It invoked 
> FileUtil.fullyDelete() which is very very slow. This means that the 
> TaskTracker cannot join the cluster for an extended period of time (upto 2 
> hours for us). The problem is acute if the number of files in a distributed 
> cache is a few-thousands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-915) For secure environments, the Map/Reduce debug script must be run as the user.

2009-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777468#action_12777468
 ] 

Hadoop QA commented on MAPREDUCE-915:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424820/915-4.patch
  against trunk revision 835237.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/138/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/138/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/138/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/138/console

This message is automatically generated.

> For secure environments, the Map/Reduce debug script must be run as the user.
> -
>
> Key: MAPREDUCE-915
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-915
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security, tasktracker
>Affects Versions: 0.21.0
>Reporter: Hemanth Yamijala
>Assignee: Devaraj Das
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: 915-4.patch, 915.1.patch, 915.2.patch, 915.patch
>
>
> The Taskcontroller model allows admins to set up a cluster configuration that 
> runs tasks as users. The debug script feature of Map/Reduce provided by the 
> configuration options: mapred.map.task.debug.script and 
> mapred.reduce.task.debug.script need to be run as the user as well in such 
> environments, rather than as the tasktracker user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1213) TaskTrackers restart is very slow because ti deletes distributed cache directory synchronously

2009-11-13 Thread dhruba borthakur (JIRA)
TaskTrackers restart is very slow because ti deletes distributed cache 
directory synchronously
--

 Key: MAPREDUCE-1213
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: dhruba borthakur


We are seeing that when we restart a tasktracker, it tries to recursively 
delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() 
which is very very slow. This means that the TaskTracker cannot join the 
cluster for an extended period of time (upto 2 hours for us). The problem is 
acute if the number of files in a distributed cache is a few-thousands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-754) NPE in expiry thread when a TT is lost

2009-11-13 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777434#action_12777434
 ] 

Hemanth Yamijala commented on MAPREDUCE-754:


Some more comments:

- It would be useful to add a javadoc for getNumberOfUniqueHosts, along with a 
reason why blacklisted hosts must be excluded from this count. Please remember 
our offline discussion where we spoke about why this number must be excluded 
when scheduling.

Other comments on test cases:
- In TestLostTracker, please separate case 3 into a separate test case. It is 
generally good unit testing practice to test separate conditions in separate 
tests.
- We can assert some state after case 2 and case 3 in addition to just making 
sure method calls succeed. For e.g. in the case of blacklisting, we can check 
the number of active hosts is decremented by the right value (because we are 
changing that API as well and will be a good check). Likewise we can also check 
that a host is blacklisted or a job is finished, etc.
- Please create the hosts.exclude file in a folder relative to TEST_DIR.
- The testcase testBlacklistedNodeDecommissioning can blacklist a node by 
globally blacklisting - rather than the health check script, which is slightly 
more complicated. One reason for doing so is that we can do this without having 
to wait for blacklisting to happen asynchronously.
- Common code in this class related to global blacklisting because of job 
failures as well as refresh of hosts can also be refactored into separate 
utility methods and reused.
- Instead of checking if the decommissioned tracker is not present in the list 
of trackers, since we are starting with only one tracker, we can explicitly 
check that the number of trackers in jt.taskTrackers is 0.
- Some additional tests that I can suggest
-- Blacklist + decommission when there are multiple trackers per host
-- Have a cluster with 3 trackers, blacklist one of them, decommission 2 of 
them, and make sure the active, decommissioned and blacklisted counts all match.

> NPE in expiry thread when a TT is lost
> --
>
> Key: MAPREDUCE-754
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-754
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Ramya R
>Assignee: Amar Kamat
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: mapreduce-754-v1.1.patch, mapreduce-754-v1.2.patch, 
> mapreduce-754-wip.patch
>
>
> NullPointerException is obtained in Tracker Expiry Thread. Below is the 
> exception obtained in the JT logs 
> {noformat}
> ERROR org.apache.hadoop.mapred.JobTracker: Tracker Expiry Thread got 
> exception: java.lang.NullPointerException
> at 
> org.apache.hadoop.mapred.JobTracker.updateTaskTrackerStatus(JobTracker.java:2971)
> at org.apache.hadoop.mapred.JobTracker.access$300(JobTracker.java:104)
> at 
> org.apache.hadoop.mapred.JobTracker$ExpireTrackers.run(JobTracker.java:381)
> at java.lang.Thread.run(Thread.java:619)
> {noformat}
> The steps to reproduce this issue are:
> * Blacklist a TT. 
> * Restart it. 
> * The above exception is obtained when the first instance of TT is marked as 
> lost.
> However the above exception does not break any functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1119) When tasks fail to report status, show tasks's stack dump before killing

2009-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777433#action_12777433
 ] 

Hadoop QA commented on MAPREDUCE-1119:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12424806/MAPREDUCE-1119.5.patch
  against trunk revision 835237.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/241/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/241/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/241/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/241/console

This message is automatically generated.

> When tasks fail to report status, show tasks's stack dump before killing
> 
>
> Key: MAPREDUCE-1119
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1119
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-1119.2.patch, MAPREDUCE-1119.3.patch, 
> MAPREDUCE-1119.4.patch, MAPREDUCE-1119.5.patch, MAPREDUCE-1119.patch
>
>
> When the TT kills tasks that haven't reported status, it should somehow 
> gather a stack dump for the task. This could be done either by sending a 
> SIGQUIT (so the dump ends up in stdout) or perhaps something like JDI to 
> gather the stack directly from Java. This may be somewhat tricky since the 
> child may be running as another user (so the SIGQUIT would have to go through 
> LinuxTaskController). This feature would make debugging these kinds of 
> failures much easier, especially if we could somehow get it into the 
> TaskDiagnostic message

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.