[jira] Commented: (MAPREDUCE-118) Job.getJobID() will always return null

2010-05-17 Thread Sharad Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868095#action_12868095
 ] 

Sharad Agarwal commented on MAPREDUCE-118:
--

Should we override getJobID() in Job and do ensureState before doing 
super.getJobID() ? This will give the consistent error message to user instead 
of returning null in some cases.

 Job.getJobID() will always return null
 --

 Key: MAPREDUCE-118
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-118
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.20.1
Reporter: Amar Kamat
Assignee: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.21.0

 Attachments: patch-118-0.20-1.txt, patch-118-0.20.txt, 
 patch-118-0.21.txt, patch-118-1.txt, patch-118-2.txt, patch-118-3.txt, 
 patch-118-4.txt, patch-118.txt


 JobContext is used for a read-only view of job's info. Hence all the readonly 
 fields in JobContext are set in the constructor. Job extends JobContext. When 
 a Job is created, jobid is not known and hence there is no way to set JobID 
 once Job is created. JobID is obtained only when the JobClient queries the 
 jobTracker for a job-id., which happens later i.e upon job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-118) Job.getJobID() will always return null

2010-05-17 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868096#action_12868096
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-118:
---

bq. Should we override getJobID() in Job and do ensureState before doing 
super.getJobID() ?
I had this in my earlier patch. But have seen problems when user calls 
getJobID() from his InputFormat.getSplis(JobContext) and etc, though the JobID 
is available by that time.

 Job.getJobID() will always return null
 --

 Key: MAPREDUCE-118
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-118
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.20.1
Reporter: Amar Kamat
Assignee: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.21.0

 Attachments: patch-118-0.20-1.txt, patch-118-0.20.txt, 
 patch-118-0.21.txt, patch-118-1.txt, patch-118-2.txt, patch-118-3.txt, 
 patch-118-4.txt, patch-118.txt


 JobContext is used for a read-only view of job's info. Hence all the readonly 
 fields in JobContext are set in the constructor. Job extends JobContext. When 
 a Job is created, jobid is not known and hence there is no way to set JobID 
 once Job is created. JobID is obtained only when the JobClient queries the 
 jobTracker for a job-id., which happens later i.e upon job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1779) Should we provide a way to know JobTracker's memory info from client?

2010-05-17 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868099#action_12868099
 ] 

Amar Kamat commented on MAPREDUCE-1779:
---

Why would client applications need JobTracker's memory information. I think the 
reason we added it to ClusterStatus was that its maintained at one place and 
its passed it to the webui for display. I dont think JobTracker's memory 
information should be a part of ClusterStatus. If at all some admins require 
it, it should be made available via MRAdmin. I dont see any reason why client 
should be aware of JobTracker's memory details. 

 Should we provide a way to know JobTracker's memory info from client?
 -

 Key: MAPREDUCE-1779
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1779
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, jobtracker
Affects Versions: 0.21.0
Reporter: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.22.0


 In HADOOP-4435, in branch 0.20, getClusterStatus() method returns 
 JobTracker's used memory and total memory.
 But these details are missed in new api (through MAPREDUCE-777).
 If these details are needed only for web UI, I don't think they are needed 
 for client.
 So, should we provide a way to know JobTracker's memory info from client?
 If yes, an api should be added in org.apache.hadoop.mapreduce.Cluster for the 
 same.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1779) Should we provide a way to know JobTracker's memory info from client?

2010-05-17 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868104#action_12868104
 ] 

dhruba borthakur commented on MAPREDUCE-1779:
-

Two reasons:
1. Hive actually throttles new job-submissions if the heap memory on the JT 
exceeds a certain threshold. 
2. It is also needed to monitor the health of the JT.

 Should we provide a way to know JobTracker's memory info from client?
 -

 Key: MAPREDUCE-1779
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1779
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, jobtracker
Affects Versions: 0.21.0
Reporter: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.22.0


 In HADOOP-4435, in branch 0.20, getClusterStatus() method returns 
 JobTracker's used memory and total memory.
 But these details are missed in new api (through MAPREDUCE-777).
 If these details are needed only for web UI, I don't think they are needed 
 for client.
 So, should we provide a way to know JobTracker's memory info from client?
 If yes, an api should be added in org.apache.hadoop.mapreduce.Cluster for the 
 same.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1793) Exception exculsion functionality is not working correctly.

2010-05-17 Thread Vinay Kumar Thota (JIRA)
Exception exculsion functionality is not working correctly.
---

 Key: MAPREDUCE-1793
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1793
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Vinay Kumar Thota
Assignee: Balaji Rajagopalan


Exception exclusion functionality is not working correctly because of that 
tests are failing by not matching the error count.
I debugged the issue and found that the problem with shell command which is 
generating in the getNumberOfMatchesInLogFile function.

Currently building the shell command in the following way. 

if(list != null){
  for(int i =0; i  list.length; ++i)
  {
filePattern.append( | grep -v  + list[i] );
  }
}
String[] cmd =
new String[] {
bash,
-c,
grep -c 
+ pattern +   + filePattern
+  | awk -F: '{s+=$2} END {print s}' };

However, The above commnad won't work correctly because you are counting the 
exceptions in the file before excluding the known exceptions.
In this case it gives the mismatch error counts everytime.The shell command 
should be in the following way to work correctly.

if (list != null) {
  int index = 0;
  for (String excludeExp : list) {
filePattern.append((++index  list.length)? | grep -v  : 
| grep -vc  + list[i] );  
  }
}
String[] cmd =
   new String[] {
   bash,
   -c,
   grep 
   + pattern +   + filePattern
   +  | awk -F: '{s+=$2} END {print s}' };  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1794) Test the job status of lost task trackers before and after the timeout.

2010-05-17 Thread Vinay Kumar Thota (JIRA)
Test the job status of lost task trackers before and after the timeout.
---

 Key: MAPREDUCE-1794
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1794
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota


This test covers the following scenarios.

1. Verify the job status whether it is succeeded or not when  the task tracker 
is lost and alive before the timeout.
2. Verify the job status and killed attempts of a task whether it is succeeded 
or not and killed attempts are matched or not  when the task trackers are lost 
and it timeout for all the four attempts of a task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1794) Test the job status of lost task trackers before and after the timeout.

2010-05-17 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1794:
-

Attachment: 1794_lost_tasktracker.patch

Please review the patch and give me your comments.

 Test the job status of lost task trackers before and after the timeout.
 ---

 Key: MAPREDUCE-1794
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1794
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
 Attachments: 1794_lost_tasktracker.patch


 This test covers the following scenarios.
 1. Verify the job status whether it is succeeded or not when  the task 
 tracker is lost and alive before the timeout.
 2. Verify the job status and killed attempts of a task whether it is 
 succeeded or not and killed attempts are matched or not  when the task 
 trackers are lost and it timeout for all the four attempts of a task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1779) Should we provide a way to know JobTracker's memory info from client?

2010-05-17 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868269#action_12868269
 ] 

Arun C Murthy commented on MAPREDUCE-1779:
--

bq. It is also needed to monitor the health of the JT. 

This is already available in JVM Metrics.

bq. Hive actually throttles new job-submissions if the heap memory on the JT 
exceeds a certain threshold. 

Again, I'd like to re-iterate that this 'feature' causes *serious* performance 
issues on the JobTracker - JNI calls are *very* expensive, a rouge client for 
this feature can easily cause *severe* harm to the JobTracker and hence the 
entire cluster.

Hive can use JVM Metrics for the same functionality given that this is already 
available in JVM Metrics.  Thus, I'm -1 on this feature.

 Should we provide a way to know JobTracker's memory info from client?
 -

 Key: MAPREDUCE-1779
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1779
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, jobtracker
Affects Versions: 0.21.0
Reporter: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.22.0


 In HADOOP-4435, in branch 0.20, getClusterStatus() method returns 
 JobTracker's used memory and total memory.
 But these details are missed in new api (through MAPREDUCE-777).
 If these details are needed only for web UI, I don't think they are needed 
 for client.
 So, should we provide a way to know JobTracker's memory info from client?
 If yes, an api should be added in org.apache.hadoop.mapreduce.Cluster for the 
 same.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1779) Should we provide a way to know JobTracker's memory info from client?

2010-05-17 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868302#action_12868302
 ] 

dhruba borthakur commented on MAPREDUCE-1779:
-

Sounds like a fine idea to me, if this data is already available via JVM 
Metrics.

 Should we provide a way to know JobTracker's memory info from client?
 -

 Key: MAPREDUCE-1779
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1779
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, jobtracker
Affects Versions: 0.21.0
Reporter: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.22.0


 In HADOOP-4435, in branch 0.20, getClusterStatus() method returns 
 JobTracker's used memory and total memory.
 But these details are missed in new api (through MAPREDUCE-777).
 If these details are needed only for web UI, I don't think they are needed 
 for client.
 So, should we provide a way to know JobTracker's memory info from client?
 If yes, an api should be added in org.apache.hadoop.mapreduce.Cluster for the 
 same.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1793) Exception exculsion functionality is not working correctly.

2010-05-17 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868367#action_12868367
 ] 

Konstantin Boudnik commented on MAPREDUCE-1793:
---

It isn't a good style to paste a content of the patch into the description 
field.

 Exception exculsion functionality is not working correctly.
 ---

 Key: MAPREDUCE-1793
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1793
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Vinay Kumar Thota
Assignee: Balaji Rajagopalan

 Exception exclusion functionality is not working correctly because of that 
 tests are failing by not matching the error count.
 I debugged the issue and found that the problem with shell command which is 
 generating in the getNumberOfMatchesInLogFile function.
 Currently building the shell command in the following way. 
 if(list != null){
   for(int i =0; i  list.length; ++i)
   {
 filePattern.append( | grep -v  + list[i] );
   }
 }
 String[] cmd =
 new String[] {
 bash,
 -c,
 grep -c 
 + pattern +   + filePattern
 +  | awk -F: '{s+=$2} END {print s}' };
 However, The above commnad won't work correctly because you are counting the 
 exceptions in the file before excluding the known exceptions.
 In this case it gives the mismatch error counts everytime.The shell command 
 should be in the following way to work correctly.
 if (list != null) {
   int index = 0;
   for (String excludeExp : list) {
 filePattern.append((++index  list.length)? | grep -v  : 
 | grep -vc  + list[i] );  
   }
 }
 String[] cmd =
new String[] {
bash,
-c,
grep 
+ pattern +   + filePattern
+  | awk -F: '{s+=$2} END {print s}' };  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1779) Should we provide a way to know JobTracker's memory info from client?

2010-05-17 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy resolved MAPREDUCE-1779.
--

Resolution: Won't Fix

Thanks Dhruba. Closing as 'wontfix'.

 Should we provide a way to know JobTracker's memory info from client?
 -

 Key: MAPREDUCE-1779
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1779
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, jobtracker
Affects Versions: 0.21.0
Reporter: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.22.0


 In HADOOP-4435, in branch 0.20, getClusterStatus() method returns 
 JobTracker's used memory and total memory.
 But these details are missed in new api (through MAPREDUCE-777).
 If these details are needed only for web UI, I don't think they are needed 
 for client.
 So, should we provide a way to know JobTracker's memory info from client?
 If yes, an api should be added in org.apache.hadoop.mapreduce.Cluster for the 
 same.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-115) Map tasks are receiving FileNotFound Exceptions for spill files on a regular basis and are getting killed

2010-05-17 Thread geoff hendrey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868472#action_12868472
 ] 

geoff hendrey commented on MAPREDUCE-115:
-

Most of my mappers are dying with this error. I am using Hadoop 20.2. Any 
suggestions for a work around?

010-05-17 14:03:42,738 INFO org.apache.hadoop.mapred.Merger: Merging 22 sorted 
segments
2010-05-17 14:03:43,099 WARN org.apache.hadoop.mapred.TaskTracker: Error 
running child
java.io.FileNotFoundException: 
/hive4/mapred/local/taskTracker/jobcache/job_201005141621_0137/attempt_201005141621_0137_m_00_0/output/spill15.out
at 
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:167)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205)
at org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1522)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)



 Map tasks are receiving FileNotFound Exceptions for spill files on a regular 
 basis and are getting killed
 -

 Key: MAPREDUCE-115
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-115
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jothi Padmanabhan

 The following is the log -- Map tasks are unable to locate the spill files 
 when they are doing the final merge (mergeParts). 
 java.io.FileNotFoundException: File 
 /xxx/mapred-tt/mapred-local/taskTracker/jobcache/job_200808190959_0001/attempt_200808190959_0001_m_00_0/output/spill23.out
  does not exist.
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
   at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
   at 
 org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:682)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.getFileLength(ChecksumFileSystem.java:218)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.seek(ChecksumFileSystem.java:259)
   at 
 org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1102)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:769)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:255)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2208)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1795) add error option if file-based record-readers fail to consume all input (e.g., concatenated gzip, bzip2)

2010-05-17 Thread Greg Roelofs (JIRA)
add error option if file-based record-readers fail to consume all input (e.g., 
concatenated gzip, bzip2)


 Key: MAPREDUCE-1795
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1795
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Greg Roelofs
Assignee: Ravi Gummadi


When running MapReduce with concatenated gzip files as input only the first 
part is read, which is confusing, to say the least. Concatenated gzip is 
described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage 
and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at 
http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1795) add error option if file-based record-readers fail to consume all input (e.g., concatenated gzip, bzip2)

2010-05-17 Thread Greg Roelofs (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Roelofs updated MAPREDUCE-1795:


 Original Estimate: 336h
Remaining Estimate: 336h
 Affects Version/s: 0.20.2
   Description: 
When running MapReduce with concatenated gzip files as input, only the first 
part (member in gzip spec parlance, http://www.ietf.org/rfc/rfc1952.txt) is 
read; the remainder is silently ignored.  As a first step toward fixing that, 
this issue will add a configurable option to throw an error in such cases.

MAPREDUCE-469 is the tracker for the more complete fix/feature, whenever that 
occurs.

  was:
When running MapReduce with concatenated gzip files as input only the first 
part is read, which is confusing, to say the least. Concatenated gzip is 
described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage 
and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at 
http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)



 add error option if file-based record-readers fail to consume all input 
 (e.g., concatenated gzip, bzip2)
 

 Key: MAPREDUCE-1795
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1795
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.2
Reporter: Greg Roelofs
Assignee: Ravi Gummadi
   Original Estimate: 336h
  Remaining Estimate: 336h

 When running MapReduce with concatenated gzip files as input, only the first 
 part (member in gzip spec parlance, http://www.ietf.org/rfc/rfc1952.txt) is 
 read; the remainder is silently ignored.  As a first step toward fixing that, 
 this issue will add a configurable option to throw an error in such cases.
 MAPREDUCE-469 is the tracker for the more complete fix/feature, whenever that 
 occurs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1795) add error option if file-based record-readers fail to consume all input (e.g., concatenated gzip, bzip2)

2010-05-17 Thread Greg Roelofs (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Roelofs reassigned MAPREDUCE-1795:
---

Assignee: Greg Roelofs

 add error option if file-based record-readers fail to consume all input 
 (e.g., concatenated gzip, bzip2)
 

 Key: MAPREDUCE-1795
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1795
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Greg Roelofs
Assignee: Greg Roelofs

 When running MapReduce with concatenated gzip files as input, only the first 
 part (member in gzip spec parlance, http://www.ietf.org/rfc/rfc1952.txt) is 
 read; the remainder is silently ignored.  As a first step toward fixing that, 
 this issue will add a configurable option to throw an error in such cases.
 MAPREDUCE-469 is the tracker for the more complete fix/feature, whenever that 
 occurs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1795) add error option if file-based record-readers fail to consume all input (e.g., concatenated gzip, bzip2)

2010-05-17 Thread Greg Roelofs (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Roelofs updated MAPREDUCE-1795:


 Original Estimate: (was: 336h)
Remaining Estimate: (was: 336h)
  Assignee: (was: Ravi Gummadi)
 Affects Version/s: (was: 0.20.2)

 add error option if file-based record-readers fail to consume all input 
 (e.g., concatenated gzip, bzip2)
 

 Key: MAPREDUCE-1795
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1795
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Greg Roelofs

 When running MapReduce with concatenated gzip files as input, only the first 
 part (member in gzip spec parlance, http://www.ietf.org/rfc/rfc1952.txt) is 
 read; the remainder is silently ignored.  As a first step toward fixing that, 
 this issue will add a configurable option to throw an error in such cases.
 MAPREDUCE-469 is the tracker for the more complete fix/feature, whenever that 
 occurs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1796) job tracker history viewer shows all recent jobs as being run at job tracker (re)start time

2010-05-17 Thread Ted Yu (JIRA)
job tracker history viewer shows all recent jobs as being run at job tracker 
(re)start time
---

 Key: MAPREDUCE-1796
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1796
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.2
Reporter: Ted Yu
 Fix For: 0.20.3


This has been the behavior of the History viewer for long that it
shows the timestamp when the JobTracker restarted rather than Job
start time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1796) job tracker history viewer shows all recent jobs as being run at job tracker (re)start time

2010-05-17 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu resolved MAPREDUCE-1796.


Fix Version/s: (was: 0.20.3)
   Resolution: Duplicate

Duplicate of MAPREDUCE-1541

 job tracker history viewer shows all recent jobs as being run at job tracker 
 (re)start time
 ---

 Key: MAPREDUCE-1796
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1796
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.2
Reporter: Ted Yu

 This has been the behavior of the History viewer for long that it
 shows the timestamp when the JobTracker restarted rather than Job
 start time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1793) Exception exculsion functionality is not working correctly.

2010-05-17 Thread Vinay Kumar Thota (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868526#action_12868526
 ] 

Vinay Kumar Thota commented on MAPREDUCE-1793:
--

It's my suggestion  where exactly the code needs to change to resolve the 
issue.So that I have mentioned the part of the code in description field. 

The pattern always  should be either ERROR,WARN and FATAL and we need to fetch 
the exceptions based on the pattern from the file.Once we got the exceptions, 
we need to exclude the exceptions from the output list.Later we need to count 
the new exceptions.

For example, In my above suggestion the shell command generates like this.

grap ERROR logfiles* | grep -v IOExceptin | grep -vc java.net.ConnectException 
| awk -F : '{s+=$2} END {print s}'

here {{filePattern}} is  logfile* and {{pattern}} is ERROR

I would say my suggestion is 100% correct and there is no faulty in that.

 Exception exculsion functionality is not working correctly.
 ---

 Key: MAPREDUCE-1793
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1793
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Vinay Kumar Thota
Assignee: Balaji Rajagopalan

 Exception exclusion functionality is not working correctly because of that 
 tests are failing by not matching the error count.
 I debugged the issue and found that the problem with shell command which is 
 generating in the getNumberOfMatchesInLogFile function.
 Currently building the shell command in the following way. 
 if(list != null){
   for(int i =0; i  list.length; ++i)
   {
 filePattern.append( | grep -v  + list[i] );
   }
 }
 String[] cmd =
 new String[] {
 bash,
 -c,
 grep -c 
 + pattern +   + filePattern
 +  | awk -F: '{s+=$2} END {print s}' };
 However, The above commnad won't work correctly because you are counting the 
 exceptions in the file before excluding the known exceptions.
 In this case it gives the mismatch error counts everytime.The shell command 
 should be in the following way to work correctly.
 if (list != null) {
   int index = 0;
   for (String excludeExp : list) {
 filePattern.append((++index  list.length)? | grep -v  : 
 | grep -vc  + list[i] );  
   }
 }
 String[] cmd =
new String[] {
bash,
-c,
grep 
+ pattern +   + filePattern
+  | awk -F: '{s+=$2} END {print s}' };  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1793) Exception exculsion functionality is not working correctly.

2010-05-17 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868535#action_12868535
 ] 

Konstantin Boudnik commented on MAPREDUCE-1793:
---

oops, you are right. I have misread the proposed fix. Sorry. I guess this is 
the absence of a patch to blame ;-)


 Exception exculsion functionality is not working correctly.
 ---

 Key: MAPREDUCE-1793
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1793
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Vinay Kumar Thota
Assignee: Balaji Rajagopalan

 Exception exclusion functionality is not working correctly because of that 
 tests are failing by not matching the error count.
 I debugged the issue and found that the problem with shell command which is 
 generating in the getNumberOfMatchesInLogFile function.
 Currently building the shell command in the following way. 
 if(list != null){
   for(int i =0; i  list.length; ++i)
   {
 filePattern.append( | grep -v  + list[i] );
   }
 }
 String[] cmd =
 new String[] {
 bash,
 -c,
 grep -c 
 + pattern +   + filePattern
 +  | awk -F: '{s+=$2} END {print s}' };
 However, The above commnad won't work correctly because you are counting the 
 exceptions in the file before excluding the known exceptions.
 In this case it gives the mismatch error counts everytime.The shell command 
 should be in the following way to work correctly.
 if (list != null) {
   int index = 0;
   for (String excludeExp : list) {
 filePattern.append((++index  list.length)? | grep -v  : 
 | grep -vc  + list[i] );  
   }
 }
 String[] cmd =
new String[] {
bash,
-c,
grep 
+ pattern +   + filePattern
+  | awk -F: '{s+=$2} END {print s}' };  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.