date:20100614


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878509#action_12878509
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1853:


Test failure is due to MAPREDUCE-1834

 MultipleOutputs does not cache TaskAttemptContext
 -

 Key: MAPREDUCE-1853
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
 Environment: OSX 10.6
 java6
Reporter: Torsten Curdt
Priority: Critical
 Fix For: 0.22.0

 Attachments: cache-task-attempts.diff


 In MultipleOutputs there is
 [code]
  private TaskAttemptContext getContext(String nameOutput) throws IOException {
 // The following trick leverages the instantiation of a record writer via
 // the job thus supporting arbitrary output formats.
 Job job = new Job(context.getConfiguration());
 job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput));
 job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput));
 job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput));
 TaskAttemptContext taskContext = 
   new TaskAttemptContextImpl(job.getConfiguration(), 
  context.getTaskAttemptID());
 return taskContext;
   }
 [code]
 so for every reduce call it creates a new Job instance ...which creates a new 
 LocalJobRunner.
 That does not sound like a good idea.
 You end up with a flood of jvm.JvmMetrics: Cannot initialize JVM Metrics 
 with processName=JobTracker, sessionId= - already initialized
 This should probably also be added to 0.22.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1857) Remove unused stream.numinputspecs configuration

2010-06-14 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878508#action_12878508
]

Hadoop QA commented on MAPREDUCE-1857:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12446851/patch-1857.txt
against trunk revision 953976.

+1 @author. The patch does not contain any @author tags.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/568/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/568/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/568/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/568/console

This message is automatically generated.

Remove unused stream.numinputspecs configuration

Key: MAPREDUCE-1857
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Trivial
Fix For: 0.22.0

Attachments: patch-1857.txt

The configuration stream.numinputspecs is just set and not read anywhere. It
can be removed.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1857) Remove unused stream.numinputspecs configuration


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878510#action_12878510
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1857:


Test failure is due to MAPREDUCE-1834.

bq. -1 tests included.
The patch removes unused code. So, no tests are added.

 Remove unused stream.numinputspecs configuration
 

 Key: MAPREDUCE-1857
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Trivial
 Fix For: 0.22.0

 Attachments: patch-1857.txt


 The configuration stream.numinputspecs is just set and not read anywhere. It 
 can be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1863) [Rumen] Null failedMapAttemptCDFs in job traces generated by Rumen

2010-06-14 Thread Amar Kamat (JIRA)

[Rumen] Null failedMapAttemptCDFs in job traces generated by Rumen
--

 Key: MAPREDUCE-1863
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1863
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Amar Kamat
Assignee: Amar Kamat


All the traces generated by Rumen for jobs having failed task attempts has null 
value for failedMapAttemptCDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1863) [Rumen] Null failedMapAttemptCDFs in job traces generated by Rumen

2010-06-14 Thread Amar Kamat (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-1863:
--

Fix Version/s: 0.22.0
Affects Version/s: 0.22.0

 [Rumen] Null failedMapAttemptCDFs in job traces generated by Rumen
 --

 Key: MAPREDUCE-1863
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1863
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0


 All the traces generated by Rumen for jobs having failed task attempts has 
 null value for failedMapAttemptCDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1122) streaming with custom input format does not support the new API


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878515#action_12878515
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1122:


Users can specify Mapper/Reducer to be Java Mapper/Reducer or a command. Also, 
he could specify input format, output format and partitioner for his streaming 
job. The below tables summarize the mapper or reducer in use when streaming 
supports both old and new api.

Note : In the tables below, NS stands for 'Not specified.

*Table 1* Mapper-in-use for given spec, when num reducers  = 0:
||Mapper || InputFormat || OutputFormat || Valid conf?|| Mapper-in-use ||
|Command|NS|NS|Yes|New|
|Command|Old|NS|Yes|Old|
|Command|Old|Old|Yes|Old|
|Command|Old|New|{color:red}No{color}|
|Command|New|NS|Yes|New|
|Command|New|Old|{color:red}No{color}|
|Command|New|New|Yes|New|
|Old|NS|NS|Yes|Old|
|Old|NS|Old|Yes|Old|
|Old|Old|NS|Yes|Old|
|Old|Old|Old|Yes|Old|
|Old|-|New|{color:red}No{color}|
|Old|New|-|{color:red}No{color}|
|New|NS|NS|Yes|New|
|New|NS|New|Yes|New|
|New|New|NS|Yes|New|
|New|New|New|Yes|New|
|New|-|Old|{color:red}No{color}|
|New|Old|-|{color:red}No{color}|

*Table 2* Mapper-in-use for given spec, when num reducers != 0:
||Mapper || InputFormat || Partitioner|| Valid conf?|| Mapper-in-use ||
|Command|NS|NS|Yes|New|
|Command|Old|NS|Yes|Old|
|Command|Old|Old|Yes|Old|
|Command|Old|New|{color:red}No{color}|
|Command|New|NS|Yes|New|
|Command|New|Old|{color:red}No{color}|
|Command|New|New|Yes|New|
|Old|NS|NS|Yes|Old|
|Old|NS|Old|Yes|Old|
|Old|Old|NS|Yes|Old|
|Old|Old|Old|Yes|Old|
|Old|New|-|{color:red}No{color}|
|Old|-|New|{color:red}No{color}|
|New|NS|NS|Yes|New|
|New|NS|New|Yes|New|
|New|New|NS|Yes|New|
|New|New|New|Yes|New|
|New|Old|-|{color:red}No{color}|
|New|-|Old|{color:red}No{color}|

*Table 3* Reducer-in-use for a given spec :
|| Reducer || OutputFormat || Valid conf?|| Reducer-in-use ||
| Command | NS |Yes |New|
| Command | Old |Yes |Old |
| Command | New |Yes |New| 
|Old|NS|Yes|Old|
|New|NS|Yes|New|
|Old|Old|Yes|Old|
|New|New|Yes|New|
|Old|New|{color:red}No{color}|
|New|Old|{color:red}No{color}|


 streaming with custom input format does not support the new API
 ---

 Key: MAPREDUCE-1122
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.20.1
 Environment: any OS
Reporter: Keith Jackson

 When trying to implement a custom input format for use with streaming, I have 
 found that streaming does not support the new API, 
 org.apache.hadoop.mapreduce.InputFormat, but requires the old API, 
 org.apache.hadoop.mapred.InputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1851) Document configuration parameters in streaming

[
https://issues.apache.org/jira/browse/MAPREDUCE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878530#action_12878530
]

Ravi Gummadi commented on MAPREDUCE-1851:
-

It seems stream.jobLog_ is not documented anywhere and seems to be not useful.
We can remove it altogether, may be in a separate JIRA. So let us not document
that here ?

stream.addenvironment seems to be internal property and is not intended for
hadoop streaming users. Let us not document it.

We can add the config property stream.stderr.reporter.prefix with the default
value reporter:. This would need changes to the questions/answers related to
update status and update counter in FAQ ?

Document configuration parameters in streaming
--

Key: MAPREDUCE-1851
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1851
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: contrib/streaming, documentation
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Fix For: 0.22.0

Attachments: patch-1851.txt

There are several streaming options such as
stream.map.output.field.separator, stream.num.map.output.key.fields,
stream.map.input.field.separator, stream.reduce.input.field.separator,
stream.map.input.ignoreKey, stream.non.zero.exit.is.failure etc which are
spread everywhere. These should be documented at single place with
description and default-value.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1857) Remove unused stream.numinputspecs configuration


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878532#action_12878532
 ] 

Ravi Gummadi commented on MAPREDUCE-1857:
-

There seems to be another config property stream.debug that is seen only in 
some unit tests. I don't know if it was added for some purpose earlier, but it 
doesn't seem to be used anywhere in source code. So can we remove that also in 
this patch itself ?

 Remove unused stream.numinputspecs configuration
 

 Key: MAPREDUCE-1857
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Trivial
 Fix For: 0.22.0

 Attachments: patch-1857.txt


 The configuration stream.numinputspecs is just set and not read anywhere. It 
 can be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1857) Remove unused stream.numinputspecs configuration


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1857:
---

Status: Open  (was: Patch Available)

 Remove unused stream.numinputspecs configuration
 

 Key: MAPREDUCE-1857
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Trivial
 Fix For: 0.22.0

 Attachments: patch-1857.txt


 The configuration stream.numinputspecs is just set and not read anywhere. It 
 can be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1857) Remove unused streaming configuration from src


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1857:
---

Summary: Remove unused streaming configuration from src  (was: Remove 
unused stream.numinputspecs configuration)

 Remove unused streaming configuration from src
 --

 Key: MAPREDUCE-1857
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Trivial
 Fix For: 0.22.0

 Attachments: patch-1857.txt


 The configuration stream.numinputspecs is just set and not read anywhere. It 
 can be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1851) Document configuration parameters in streaming


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878537#action_12878537
 ] 

Ravi Gummadi commented on MAPREDUCE-1851:
-

We could also specify for the 4 properties stream.map.input, stream.map.output, 
stream.reduce.input and stream.reduce.input that these will take the values 
given with -D only if -io identifier is not used. In other words, Should we 
say that -io identifier will replace these 4 properties with the 
identifier ?

 Document configuration parameters in streaming
 --

 Key: MAPREDUCE-1851
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1851
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/streaming, documentation
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.22.0

 Attachments: patch-1851.txt


 There are several streaming options such as 
 stream.map.output.field.separator, stream.num.map.output.key.fields, 
 stream.map.input.field.separator,  stream.reduce.input.field.separator,  
 stream.map.input.ignoreKey, stream.non.zero.exit.is.failure etc which are 
 spread everywhere. These should be documented at single place with 
 description and default-value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1857) Remove unused streaming configuration from src


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1857:
---

Attachment: patch-1857-1.txt

Patch removes stream.debug from testcases and unused 
stream.recordreader.compression from TestGZipInput. Also removes commented 
lines in some testcases.

 Remove unused streaming configuration from src
 --

 Key: MAPREDUCE-1857
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Trivial
 Fix For: 0.22.0

 Attachments: patch-1857-1.txt, patch-1857.txt


 The configuration stream.numinputspecs is just set and not read anywhere. It 
 can be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1857) Remove unused streaming configuration from src


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1857:
---

Status: Patch Available  (was: Open)

 Remove unused streaming configuration from src
 --

 Key: MAPREDUCE-1857
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Trivial
 Fix For: 0.22.0

 Attachments: patch-1857-1.txt, patch-1857.txt


 The configuration stream.numinputspecs is just set and not read anywhere. It 
 can be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1857) Remove unused streaming configuration from src


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878544#action_12878544
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1857:


bq. unused stream.recordreader.compression from TestGZipInput.
When the test was added,  this configuration was read from 
StreamLineRecordReader. Now, the RecordReader no more exists.

 Remove unused streaming configuration from src
 --

 Key: MAPREDUCE-1857
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Trivial
 Fix For: 0.22.0

 Attachments: patch-1857-1.txt, patch-1857.txt


 The configuration stream.numinputspecs is just set and not read anywhere. It 
 can be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1857) Remove unused streaming configuration from src


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878546#action_12878546
 ] 

Ravi Gummadi commented on MAPREDUCE-1857:
-

Patch looks good.
+1

 Remove unused streaming configuration from src
 --

 Key: MAPREDUCE-1857
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Trivial
 Fix For: 0.22.0

 Attachments: patch-1857-1.txt, patch-1857.txt


 The configuration stream.numinputspecs is just set and not read anywhere. It 
 can be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1829) JobInProgress.findSpeculativeTask should use min() to find the candidate instead of sort()


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878659#action_12878659
 ] 

Scott Chen commented on MAPREDUCE-1829:
---

Thanks for your help, Vinod and Ravi :)

 JobInProgress.findSpeculativeTask should use min() to find the candidate 
 instead of sort()
 --

 Key: MAPREDUCE-1829
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1829
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1829-20100610.txt, MAPREDUCE-1829.txt


 findSpeculativeTask needs only one candidate to speculate so it does not need 
 to sort the whole list. It may looks OK but someone can still submit big jobs 
 with small slow task thresholds. In this case, this sorting becomes expensive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1854) [herriot] Automate health script system test

2010-06-14 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878719#action_12878719
 ] 

Konstantin Boudnik commented on MAPREDUCE-1854:
---

bq. The tasktrackerStatus is a writable object, should'nt the inner class of 
writable object be public for others to use.
You might be right. However, this field has package-private access. And I 
believe this has been done for a reason. I am not an expert on MR's internals 
to tell you one way or another. However, from the common standpoint such 
widening of permissions isn't advisable.

.bq My real intention of having abstract parent class is have common 
functionality that can be shared
If in the future we'll see that the number of such shared functions is growing 
and it become useful to move them all to the common parent we might do just 
that. However, two functions don't like a good justification to me.

Thanks for the explanations on the Common classes' modifications. They all make 
sense. I guess these changes will have to end up in a separate JIRA though.

What about getting rid of the script wrappers for ssh functionality and sleep? 

 [herriot] Automate health script system test
 

 Key: MAPREDUCE-1854
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1854
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: test
 Environment: Herriot framework
Reporter: Balaji Rajagopalan
Assignee: Balaji Rajagopalan
 Attachments: health_script_5.txt

   Original Estimate: 120h
  Remaining Estimate: 120h

 1. There are three scenarios, first is induce a error from health script, 
 verify that task tracker is blacklisted. 
 2. Make the health script timeout and verify the task tracker is blacklisted. 
 3. Make an error in the health script path and make sure the task tracker 
 stays healthy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1774) Large-scale Automated Framework

2010-06-14 Thread Konstantin Boudnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-1774:
--

Attachment: MAPREDUCE-1774.patch

- This patch addresses audit warnings caused by missing Apache license boiler 
plate in a couple of places.
- Javac warnings are caused by using deprecated {{JobConf}} and {{JobContext}} 
in two new classes from {{testjar}} package. While this is a valid issue I am 
not sure if it has to fought considering 2K+ of similar warnings all over the 
MR code. 
- Core tests failures are old: they are around for at least 6 days and this 
patch hasn't cause any ones
Contrib test failure seems irrelevant (a Mumak testcase 
{{TestSimulatorDeterministicReplay}} timing out for over 10 days).


 Large-scale Automated Framework
 ---

 Key: MAPREDUCE-1774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1774
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
 Attachments: MAPREDUCE-1774.patch, MAPREDUCE-1774.patch, 
 MAPREDUCE-1774.patch, MAPREDUCE-1774.patch, MAPREDUCE-1774.patch, 
 MAPREDUCE-1774.patch, MAPREDUCE-1774.patch, MAPREDUCE-1774.patch, 
 MAPREDUCE-1774.patch


 This is MapReduce part of HADOOP-6332

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1559) The DelegationTokenRenewal timer task should use the jobtracker's credentials to create the filesystem

2010-06-14 Thread Jitendra Nath Pandey (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated MAPREDUCE-1559:


Attachment: MR-1559.1.patch

Patch for trunk uploaded.

 The DelegationTokenRenewal timer task should use the jobtracker's credentials 
 to create the filesystem
 --

 Key: MAPREDUCE-1559
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1559
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: MR-1559.1.patch, mr-1559.patch


 The submitJob RPC finally creates a timer task for renewing the delegation 
 tokens of the submitting user. This timer task inherits the context of the 
 RPC handler that runs in the context of the job submitting user, and when it 
 tries to create a filesystem, the RPC client tries to use the user's 
 credentials. This should instead use the JobTracker's credentials.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1559) The DelegationTokenRenewal timer task should use the jobtracker's credentials to create the filesystem

2010-06-14 Thread Jitendra Nath Pandey (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated MAPREDUCE-1559:


Status: Patch Available  (was: Open)

 The DelegationTokenRenewal timer task should use the jobtracker's credentials 
 to create the filesystem
 --

 Key: MAPREDUCE-1559
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1559
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: MR-1559.1.patch, mr-1559.patch


 The submitJob RPC finally creates a timer task for renewing the delegation 
 tokens of the submitting user. This timer task inherits the context of the 
 RPC handler that runs in the context of the job submitting user, and when it 
 tries to create a filesystem, the RPC client tries to use the user's 
 credentials. This should instead use the JobTracker's credentials.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1848:
--

Attachment: MAPREDUCE-1848-20100614.txt

 Put number of speculative, data local, rack local tasks in JobTracker metrics
 -

 Key: MAPREDUCE-1848
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1848-20100614.txt


 It will be nice that we can collect these information in JobTracker metrics

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1848:
--

Attachment: (was: MAPREDUCE-1848-20100614.txt)

 Put number of speculative, data local, rack local tasks in JobTracker metrics
 -

 Key: MAPREDUCE-1848
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1848-20100614.txt


 It will be nice that we can collect these information in JobTracker metrics

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1848:
--

Attachment: MAPREDUCE-1848-20100614.txt

 Put number of speculative, data local, rack local tasks in JobTracker metrics
 -

 Key: MAPREDUCE-1848
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1848-20100614.txt


 It will be nice that we can collect these information in JobTracker metrics

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1848:
--

Status: Patch Available  (was: Open)

 Put number of speculative, data local, rack local tasks in JobTracker metrics
 -

 Key: MAPREDUCE-1848
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1848-20100614.txt


 It will be nice that we can collect these information in JobTracker metrics

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878783#action_12878783
 ] 

Scott Chen commented on MAPREDUCE-1848:
---

Add four methods in JobTrackerInstrumentation to collect speculative tasks, 
data and rack local tasks:
{code}
public void speculateMap(TaskAttemptID taskAttemptID)
public void speculateReduce(TaskAttemptID taskAttemptID)
public void launchDataLocalMap(TaskAttemptID taskAttemptID)
public void launchRackLocalMap(TaskAttemptID taskAttemptID)
{code}

 Put number of speculative, data local, rack local tasks in JobTracker metrics
 -

 Key: MAPREDUCE-1848
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1848-20100614.txt


 It will be nice that we can collect these information in JobTracker metrics

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1850) Include job submit host information (name and ip) in jobconf and jobdetails display

2010-06-14 Thread Krishna Ramachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Ramachandran updated MAPREDUCE-1850:


Attachment: mapred-1850.patch

This is a forward port from a patch for an earlier release

Fix for deprecated APIs
For trunk still need to fix Job.java and Configuration

 Include job submit host information (name and ip) in jobconf and jobdetails 
 display
 ---

 Key: MAPREDUCE-1850
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1850
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Krishna Ramachandran
Assignee: Krishna Ramachandran
 Attachments: mapred-1850.patch


 Enhancement to identify the source (submit host and ip) of a job request. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1831) Delete the co-located replicas when raiding file


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1831:
--

Status: Open  (was: Patch Available)

 Delete the co-located replicas when raiding file
 

 Key: MAPREDUCE-1831
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1831.20100610.txt, MAPREDUCE-1831.txt, 
 MAPREDUCE-1831.v1.1.txt


 In raid, it is good to have the blocks on the same stripe located on 
 different machine.
 This way when one machine is down, it does not broke two blocks on the stripe.
 By doing this, we can decrease the block error probability in raid from 
 O(p^3) to O(p^4) which can be a hugh improvement (where p is the replica 
 missing probability).
 One way to do this is that we can add a new BlockPlacementPolicy which 
 deletes the replicas that are co-located.
 So when raiding the file, we can make the remaining replicas live on 
 different machines.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1831) Delete the co-located replicas when raiding file


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1831:
--

Status: Patch Available  (was: Open)

 Delete the co-located replicas when raiding file
 

 Key: MAPREDUCE-1831
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1831.20100610.txt, MAPREDUCE-1831.txt, 
 MAPREDUCE-1831.v1.1.txt


 In raid, it is good to have the blocks on the same stripe located on 
 different machine.
 This way when one machine is down, it does not broke two blocks on the stripe.
 By doing this, we can decrease the block error probability in raid from 
 O(p^3) to O(p^4) which can be a hugh improvement (where p is the replica 
 missing probability).
 One way to do this is that we can add a new BlockPlacementPolicy which 
 deletes the replicas that are co-located.
 So when raiding the file, we can make the remaining replicas live on 
 different machines.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-647) Update the DistCp forrest doc to make it consistent with the latest changes (5472, 5620, 5762, 5826)

2010-06-14 Thread Rodrigo Schmidt (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878823#action_12878823
 ] 

Rodrigo Schmidt commented on MAPREDUCE-647:
---

Nicholas, would you mind reviewing this patch?

 Update the DistCp forrest doc to make it consistent with the latest changes 
 (5472, 5620, 5762, 5826)
 

 Key: MAPREDUCE-647
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-647
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
 Attachments: MAPREDUCE-647.patch


 New features have been added to DistCp and the documentation must be updated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1548) Hadoop archives should be able to preserve times and other properties from original files

2010-06-14 Thread Rodrigo Schmidt (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rodrigo Schmidt updated MAPREDUCE-1548:
---

Status: Open  (was: Patch Available)

 Hadoop archives should be able to preserve times and other properties from 
 original files
 -

 Key: MAPREDUCE-1548
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1548
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: harchive
Affects Versions: 0.22.0
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1548.0.patch


 Files inside hadoop archives don't keep their original:
 - modification time
 - access time
 - permission
 - owner
 - group
 all such properties are currently taken from the file storing the archive 
 index, and not the stored files. This doesn't look very correct.
 There should be possible to preserve the original properties of the stored 
 files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1548) Hadoop archives should be able to preserve times and other properties from original files

2010-06-14 Thread Rodrigo Schmidt (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rodrigo Schmidt updated MAPREDUCE-1548:
---

Status: Patch Available  (was: Open)

 Hadoop archives should be able to preserve times and other properties from 
 original files
 -

 Key: MAPREDUCE-1548
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1548
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: harchive
Affects Versions: 0.22.0
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1548.0.patch


 Files inside hadoop archives don't keep their original:
 - modification time
 - access time
 - permission
 - owner
 - group
 all such properties are currently taken from the file storing the archive 
 index, and not the stored files. This doesn't look very correct.
 There should be possible to preserve the original properties of the stored 
 files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed

2010-06-14 Thread Dick King (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878824#action_12878824
 ] 

Dick King commented on MAPREDUCE-323:
-

After some discussions, we've come to some decisions.

1: We'll store the completed jobs' history files in the DFS done history files 
tree, in the following fixed format:

{{DONE/job-tracker-instance-ID//MM/DD/987654/}}

The job tracker instance ID includes both the job tracker machine name and the 
epoch time of the instance start.  There won't be very many directories on this 
level.

{{/MM/DD}} documents the date of completion [actually, the date that the 
history file is copied to DFS].

{{987654}} are the leading six digits of the job serial number, considered as a 
nine-digit integer.  The leading zeros ARE included, so the directories can be 
enumerated correctly in lexicographical order.  Therefore, no directory will 
have more than 2000 files, except in the unlikely case that there are more than 
2 million jobs in one day.

2: We will modify the web application, {{jobhistory.jsp}} , in the following 
ways:  

2a: We will decide how many jobs to filter based on the following criteria

2a1: We stop at 11 tranches of serial numbers [the tenth boundary] or a day 
boundary, whichever comes first [but that page delivers buttons inviting you to 
ask for previous days,or more tranches].  Of course, as now, we stop at 100 
items if we get that many items before crossing the directory boundary, but in 
the new code we will remember where to continue.  However, in the new codebase 
we won't {{ls}} the files we don't present, improving the responsiveness 
accordingly.

2b: We will present the job history links, newest first.

2b1: To make this coherent, we will remember where we left off for pagination

To summarize how the code will work, the pagination controls will look like 
this:

Available Jobs in History (displaying 100 jobs from 1 to 100) {{[show all] 
[show 1000 per page] [show entire day] [first page][last page]}}

{{ golem-jt1.megacorp.com-2010-05-18 golem-jt1.megacorp.com-2010-04-18 }} 
[current JT instance, previous and/or following.  This line of pagination 
controls is omitted if there is only one.]

{{ newest 2010/06/14  2010/06/13  2010/06/12 2010/06/11 2010/06/10 oldest }}  
[current day, two days previous, two days succeeding -- only within the current 
JT instance]

{{ oldest 1 2 3 4 5 next newest }} directional words change when the search 
direction changes

2c: There is a notion of search direction.  Currently we display oldest first, 
but I'm thinking of changing that because I judge most recent first to be the 
better default, especially as uptimes increase as the product becomes more 
mature.  What do you think?

Users can change direction by going to last page -- or oldest/newest date 
-- or oldest/newest task tracker.  When you've done that, the navigation 
cursors change so you're going in the right direction.


 Improve the way job history files are managed
 -

 Key: MAPREDUCE-323
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-323
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0, 0.22.0
Reporter: Amar Kamat
Assignee: Dick King
Priority: Critical

 Today all the jobhistory files are dumped in one _job-history_ folder. This 
 can cause problems when there is a need to search the history folder 
 (job-recovery etc). It would be nice if we group all the jobs under a _user_ 
 folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. 
 Jobs can be categorized using various features like _jobid, date, jobname_ 
 etc but using _username_ will make the search much more efficient and also 
 will not result into namespace explosion. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

2010-06-14 Thread Arun C Murthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-1073:
-

  Status: Open  (was: Patch Available)
Assignee: Dick King

 Progress reported for pipes tasks is incorrect.
 ---

 Key: MAPREDUCE-1073
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: pipes
Affects Versions: 0.20.1
Reporter: Sreekanth Ramakrishnan
Assignee: Dick King
 Attachments: mapreduce-1073--2010-03-31.patch, 
 mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch


 Currently in pipes, 
 {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReaderK1, V1, 
 OutputCollectorK2, V2, Reporter)}} we do the following:
 {code}
 while (input.next(key, value)) {
   downlink.mapItem(key, value);
   if(skipping) {
 downlink.flush();
   }
 }
 {code}
 This would result in consumption of all the records for current task and 
 taking task progress to 100% whereas the actual pipes application would be 
 trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1778) CompletedJobStatusStore initialization should fail if {mapred.job.tracker.persist.jobstatus.dir} is unwritable

2010-06-14 Thread Krishna Ramachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Ramachandran updated MAPREDUCE-1778:


Attachment: mapred-1778.20S-1.patch

revised 20S patch (git pull) after repo sync

 CompletedJobStatusStore initialization should fail if 
 {mapred.job.tracker.persist.jobstatus.dir} is unwritable
 --

 Key: MAPREDUCE-1778
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1778
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Amar Kamat
Assignee: Krishna Ramachandran
 Attachments: mapred-1778-1.patch, mapred-1778-2.patch, 
 mapred-1778-3.patch, mapred-1778-4.patch, mapred-1778.20S-1.patch, 
 mapred-1778.20S.patch, mapred-1778.patch


 If {mapred.job.tracker.persist.jobstatus.dir} points to an unwritable 
 location or mkdir of {mapred.job.tracker.persist.jobstatus.dir} fails, then 
 CompletedJobStatusStore silently ignores the failure and disables 
 CompletedJobStatusStore. Ideally the JobTracker should bail out early 
 indicating a misconfiguration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-469) Support concatenated gzip and bzip2 files

2010-06-14 Thread Greg Roelofs (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Greg Roelofs updated MAPREDUCE-469:
---

Attachment: grr-hadoop-common.dif.20100614c
grr-hadoop-mapreduce.dif.20100614c

Almost-final gzip concatenation code (several style-related issues to deal
with, but working code, both native and non-native, with no debug statements)
and a halfway test case (need to get bzip2 half working).

Summary: I implemented an Inflater-based Decompressor with manual gzip
header/trailer parsing and CRC checks, and added new getRemaining() and
resetPartially() methods to the interface. I also modified DecompressorStream
to support concatenated streams (decompress() and getCompressedData() methods).
For backward compatibility, the default behavior is unchanged; one needs to
set the new io.compression.gzip.concat config option to true to turn it on.
Since bzip2 apparently changed its behavior without such a setting, perhaps
this is overkill...

Anyway, this is against trunk (as of a week or two ago). I still need to check
it against Yahoo's tree, deal with the FIXMEs, update my source tree(s), run
test-patch, etc. Also, I haven't included the (binary) test files here; I'll
do so in one of the next versions of the patch.

Support concatenated gzip and bzip2 files
-

Key: MAPREDUCE-469
URL: https://issues.apache.org/jira/browse/MAPREDUCE-469
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Tom White
Assignee: Greg Roelofs
Attachments: grr-hadoop-common.dif.20100614c,
grr-hadoop-mapreduce.dif.20100614c

When running MapReduce with concatenated gzip files as input only the first
part is read, which is confusing, to say the least. Concatenated gzip is
described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage
and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at
http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-469) Support concatenated gzip and bzip2 files

2010-06-14 Thread David Ciemiewicz (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878848#action_12878848
 ] 

David Ciemiewicz commented on MAPREDUCE-469:


On vacation Mon-Wed Feb 15-17. Offsite Thu-Fri, Feb 18-19.


 Support concatenated gzip and bzip2 files
 -

 Key: MAPREDUCE-469
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-469
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Tom White
Assignee: Greg Roelofs
 Attachments: grr-hadoop-common.dif.20100614c, 
 grr-hadoop-mapreduce.dif.20100614c


 When running MapReduce with concatenated gzip files as input only the first 
 part is read, which is confusing, to say the least. Concatenated gzip is 
 described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage 
 and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at 
 http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1864) PipeMapRed.java has unintialized members log_ and LOGNAME

PipeMapRed.java has unintialized members log_ and LOGNAME 
--

 Key: MAPREDUCE-1864
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1864
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
 Fix For: 0.22.0


PipeMapRed.java has members log_ and LOGNAME, which are never initialized and 
they are used in code for logging in several places. 
They should be removed and PipeMapRed should use commons LogFactory and Log for 
logging. This would improve code maintainability.

Also, as per [comment | 
https://issues.apache.org/jira/browse/MAPREDUCE-1851?focusedCommentId=12878530page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12878530],
 stream.joblog_ configuration property can be removed.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1864) PipeMapRed.java has uninitialized members log_ and LOGNAME


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1864:
---

Summary: PipeMapRed.java has uninitialized members log_ and LOGNAME   (was: 
PipeMapRed.java has unintialized members log_ and LOGNAME )

 PipeMapRed.java has uninitialized members log_ and LOGNAME 
 ---

 Key: MAPREDUCE-1864
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1864
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
 Fix For: 0.22.0


 PipeMapRed.java has members log_ and LOGNAME, which are never initialized and 
 they are used in code for logging in several places. 
 They should be removed and PipeMapRed should use commons LogFactory and Log 
 for logging. This would improve code maintainability.
 Also, as per [comment | 
 https://issues.apache.org/jira/browse/MAPREDUCE-1851?focusedCommentId=12878530page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12878530],
  stream.joblog_ configuration property can be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1765) Streaming doc - change StreamXmlRecord to StreamXmlRecordReader