date:20091217

[
https://issues.apache.org/jira/browse/MAPREDUCE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791852#action_12791852
]

Hadoop QA commented on MAPREDUCE-1174:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12428067/MAPREDUCE-1174.4.patch
against trunk revision 891524.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/207/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/207/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/207/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/207/console

This message is automatically generated.

Sqoop improperly handles table/column names which are reserved sql words

Key: MAPREDUCE-1174
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1174
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
Attachments: MAPREDUCE-1174.2.patch, MAPREDUCE-1174.3.patch,
MAPREDUCE-1174.4.patch, MAPREDUCE-1174.patch

In some databases it is legal to name tables and columns with terms that
overlap SQL reserved keywords (e.g., {{CREATE}}, {{table}}, etc.). In such
cases, the database allows you to escape the table and column names. We
should always escape table and column names when possible.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1305) Massive performance problem with DistCp and -delete

[
https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791855#action_12791855
]

Hadoop QA commented on MAPREDUCE-1305:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12428234/MAPRED-1305.patch
against trunk revision 891524.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/208/console

This message is automatically generated.

Massive performance problem with DistCp and -delete
---

Key: MAPREDUCE-1305
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: distcp
Affects Versions: 0.20.1
Reporter: Peter Romianowski
Assignee: Peter Romianowski
Attachments: MAPRED-1305.patch

*First problem*
In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus
objects when the path is all we need.
The performance problem comes from
org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries
to retrieve file permissions by issuing a ls -ld path which is painfully
slow.
Changed that to just serialize Path and not FileStatus.
*Second problem*
To delete the files we invoke the hadoop command line tool with option
-rmr path. Again, for each file.
Changed that to dstfs.delete(path, true)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1067) Default state of queues is undefined when unspecified

2009-12-17 Thread rahul k singh (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791868#action_12791868
 ] 

rahul k singh commented on MAPREDUCE-1067:
--

+1 with patch

 Default state of queues is undefined when unspecified
 -

 Key: MAPREDUCE-1067
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1067
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: V.V.Chaitanya Krishna
Assignee: V.V.Chaitanya Krishna
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1067-1.patch, MAPREDUCE-1067-2.patch, 
 MAPREDUCE-1067-3.patch, MAPREDUCE-1067-4.patch, MAPREDUCE-1067-5.patch, 
 MAPREDUCE-1067-6.patch


 Currently, if the state of a queue is not specified, it is being set to 
 undefined state instead of running state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-118) Job.getJobID() will always return null

[
https://issues.apache.org/jira/browse/MAPREDUCE-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791873#action_12791873
]

Amareshwari Sriramadasu commented on MAPREDUCE-118:
---

The proposal looks fine.
But I found small issue implementing it. In 0.21, ClientProtocol.getNewJobID()
throws InterruptedException out. The new Job constructors(introduced in 0.21)
can be changed to throw InterruptedException. But, the deprecated constructors
cannot be changed.

After discussing with Arun, one solution we could think of is add a deprecated
setJobID in JobContextImpl, which can be called from deprecated constructors.
Will remove the newly added method, when we remove the deprecated constructors.

Job.getJobID() will always return null
--

Key: MAPREDUCE-118
URL: https://issues.apache.org/jira/browse/MAPREDUCE-118
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Amar Kamat
Assignee: Amareshwari Sriramadasu
Priority: Blocker
Fix For: 0.20.2

Attachments: patch-118-0.20.txt, patch-118-0.21.txt, patch-118.txt

JobContext is used for a read-only view of job's info. Hence all the readonly
fields in JobContext are set in the constructor. Job extends JobContext. When
a Job is created, jobid is not known and hence there is no way to set JobID
once Job is created. JobID is obtained only when the JobClient queries the
jobTracker for a job-id., which happens later i.e upon job submission.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-896) Users can set non-writable permissions on temporary files for TT and can abuse disk usage.

2009-12-17 Thread Ravi Gummadi (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Gummadi updated MAPREDUCE-896:
---

Attachment: MR-896.v3.patch

Attaching patch for trunk.
Incorporated review comments. Fixed the issue of launching task-controller in
case of path not existing, similar to y896.2.1.fix.v2.patch. Added more
testcases for cases of (a) needCleanup is false and (b) jvmReuse.

Users can set non-writable permissions on temporary files for TT and can
abuse disk usage.
--

Key: MAPREDUCE-896
URL: https://issues.apache.org/jira/browse/MAPREDUCE-896
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: tasktracker
Affects Versions: 0.21.0
Reporter: Vinod K V
Assignee: Ravi Gummadi
Fix For: 0.21.0

Attachments: MR-896.patch, MR-896.v1.patch, MR-896.v2.patch,
MR-896.v3.patch, y896.v1.patch, y896.v2.1.fix.patch, y896.v2.1.fix.v1.patch,
y896.v2.1.fix.v2.patch, y896.v2.1.patch, y896.v2.patch

As of now, irrespective of the TaskController in use, TT itself does a full
delete on local files created by itself or job tasks. This step, depending
upon TT's umask and the permissions set by files by the user, for e.g in
job-work/task-work or child.tmp directories, may or may not go through
successful completion fully. Thus is left an opportunity for abusing disk
space usage either accidentally or intentionally by TT/users.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-913) TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks and hung TaskTracker


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-913:
--

Assignee: Amareshwari Sriramadasu
  Status: Patch Available  (was: Open)

 TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks 
 and hung TaskTracker
 

 Key: MAPREDUCE-913
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-913
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.20.1
Reporter: Vinod K V
Assignee: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.21.0

 Attachments: mapreduce-913-1.patch, MAPREDUCE-913-20091119.1.txt, 
 MAPREDUCE-913-20091119.2.txt, MAPREDUCE-913-20091120.1.txt, patch-913.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-913) TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks and hung TaskTracker

[
https://issues.apache.org/jira/browse/MAPREDUCE-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amareshwari Sriramadasu updated MAPREDUCE-913:
--

Attachment: patch-913.txt

Patch does the following:
1. changed reportTaskFinished code to ensure release slot happens always by
calling releaseSlot in finally block.
2. Have undone the changes to do with throwing exception when arguments to
debug-script could not be constructed, as it was already initializing them to
empty String.
3. Modified the testcase to use new api.

bq. In test case can we verify the correct number of the map slot is actually
reported back to JobTracker after the failing job completes, this would test
the actual slot management.
4. Added asserts for slot management. Verified the test passes with the patch
and fails without the patch.

bq. Can we check if the workDir is non-null in the run-debug script and throw
an exception if the same is null? Would prevent launch of task-controller code.
If workdDir is null or if it doesnt exists, the current code already throws
IOException.

bq. Wouldn't it be much better that we add a check to figure out if the taskJVM
was launched or not and then run debug script accordingly.
This may need more discussion, since it changes the feature in a way that debug
script will be launched only when taskJvm is launched properly.

TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks
and hung TaskTracker

Key: MAPREDUCE-913
URL: https://issues.apache.org/jira/browse/MAPREDUCE-913
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: tasktracker
Affects Versions: 0.20.1
Reporter: Vinod K V
Priority: Blocker
Fix For: 0.21.0

Attachments: mapreduce-913-1.patch, MAPREDUCE-913-20091119.1.txt,
MAPREDUCE-913-20091119.2.txt, MAPREDUCE-913-20091120.1.txt, patch-913.txt

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8

[
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791951#action_12791951
]

Hadoop QA commented on MAPREDUCE-1277:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12428009/streaming-1277-new.patch
against trunk revision 891524.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/210/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/210/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/210/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/210/console

This message is automatically generated.

Streaming job should support other characterset in user's stderr log, not
only utf8
---

Key: MAPREDUCE-1277
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/streaming
Affects Versions: 0.21.0
Reporter: ZhuGuanyin
Assignee: ZhuGuanyin
Fix For: 0.21.0

Attachments: streaming-1277-new.patch, streaming-1277.patch

Current implementation in streaming only support utf8 encoded user stderr
log, it should encode free to support other characterset.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1308) reduce tasks stall and are eventually killed

[
https://issues.apache.org/jira/browse/MAPREDUCE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791965#action_12791965
]

Brian Karlak commented on MAPREDUCE-1308:
-

Arun --

Thanks for the pointers. I'm not quite sure how mapred.task.timeout got set
incorrectly -- I went through our local SVN repo, and it seems to have been set
that way at our site since we were using 0.16.4 back in July 2008. Since it
was never an issue until now, we never noticed, I guess. ;-)

Parameter has been modified.

I'll check the syslogs and report back in the next comment.

Brian

reduce tasks stall and are eventually killed

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1308) reduce tasks stall and are eventually killed

[
https://issues.apache.org/jira/browse/MAPREDUCE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791969#action_12791969
]

Brian Karlak commented on MAPREDUCE-1308:
-

I can find no indication of error in the syslog files.

/var/log/messages has only syslog-ng and ntpd messages in the time 03:50 --
05:40.

reduce tasks stall and are eventually killed

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1308) reduce tasks stall and are eventually killed


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791973#action_12791973
 ] 

Brian Karlak commented on MAPREDUCE-1308:
-

Outside of the time period in question, I do see a few other worrisome log 
messages in the hadoop (not syslog) files.

In the datanode logs, I see 5 messages (in a 24 hour period) like:

2009-12-17 10:14:13,565 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.29.2.67:50010, 
storageID=DS-739735928-172.29.2.67-50010-1259798617913, infoPort=50075, 
ipcPort=50020):Got exception while serving blk_786885296716083440_1313776 to 
/172.29.2.67:
java.io.IOException: Block blk_786885296716083440_1313776 is not valid.
at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:731)
at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:719)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:92)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619)

2009-12-17 10:14:13,565 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.29.2.67:50010, 
storageID=DS-739735928-172.29.2.67-50010-1259798617913, infoPort=50075, 
ipcPort=50020):DataXceiver
java.io.IOException: Block blk_786885296716083440_1313776 is not valid.
at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:731)
at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:719)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:92)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619)




 reduce tasks stall and are eventually killed
 

 Key: MAPREDUCE-1308
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1308
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: 20-node cluster, 8 cores per machine, 32GB memory, 
 Fedora Linux
Reporter: Brian Karlak

 We recently migrated our 0.19.2 cluster from Gentoo Linux to Fedora Linux.  
 Everything was running smoothly before, but now about 5%-10% of our jobs have 
 at least one reduce task that stalls out and is eventually killed with the 
 message:
   Task attempt_200912102211_1648_r_09_0 failed to report status for 
 6003 seconds. Killing!
 The task is then re-launched and completes successfully, usually in a couple 
 of minutes.
 This is problematic because our scheduled Hadoop jobs now take an extra 
 hour-and-a-half to run (6000 seconds).
 There are no indications in the logs that anything is amiss.  The task 
 starts, a small amount of the copy/shuffle runs, and then nothing is else is 
 heard from the task until it is killed.  I will attach the relevant parts of 
 the TaskTracker logs in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1308) reduce tasks stall and are eventually killed


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791974#action_12791974
 ] 

Brian Karlak commented on MAPREDUCE-1308:
-


And I see one message for a SocketTimeout:

2009-12-17 06:26:20,082 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.29.2.67:50010, 
storageID=DS-739735928-172.29.2.67-50010-1259798617913, infoPort=50075, 
ipcPort=50020):Got exception while serving blk_-7712543153225807619_1300911 to 
/172.29.2.67:
java.net.SocketTimeoutException: 48 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/172.29.2.67:50010 remote=/172.29.2.67:41058]
at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)
at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619)

2009-12-17 06:26:20,082 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.29.2.67:50010, 
storageID=DS-739735928-172.29.2.67-50010-1259798617913, infoPort=50075, 
ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 48 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/172.29.2.67:50010 remote=/172.29.2.67:41058]
at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)
at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619)


 reduce tasks stall and are eventually killed
 

 Key: MAPREDUCE-1308
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1308
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: 20-node cluster, 8 cores per machine, 32GB memory, 
 Fedora Linux
Reporter: Brian Karlak

 We recently migrated our 0.19.2 cluster from Gentoo Linux to Fedora Linux.  
 Everything was running smoothly before, but now about 5%-10% of our jobs have 
 at least one reduce task that stalls out and is eventually killed with the 
 message:
   Task attempt_200912102211_1648_r_09_0 failed to report status for 
 6003 seconds. Killing!
 The task is then re-launched and completes successfully, usually in a couple 
 of minutes.
 This is problematic because our scheduled Hadoop jobs now take an extra 
 hour-and-a-half to run (6000 seconds).
 There are no indications in the logs that anything is amiss.  The task 
 starts, a small amount of the copy/shuffle runs, and then nothing is else is 
 heard from the task until it is killed.  I will attach the relevant parts of 
 the TaskTracker logs in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1308) reduce tasks stall and are eventually killed


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791978#action_12791978
 ] 

Brian Karlak commented on MAPREDUCE-1308:
-

And in the tasktracker logs, around the same time but for a different task, I 
get errors like the one below.  There are about 55 of these over a 24-hour 
period.

2009-12-17 03:56:14,843 WARN org.apache.hadoop.mapred.TaskTracker: 
getMapOutput(attempt_200912102211_1648_m_82_0,46) failed :
java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at 
org.mortbay.http.ChunkingOutputStream.bypassWrite(ChunkingOutputStream.java:151)
at 
org.mortbay.http.BufferedOutputStream.write(BufferedOutputStream.java:139)
at org.mortbay.http.HttpOutputStream.write(HttpOutputStream.java:423)
at org.mortbay.jetty.servlet.ServletOut.write(ServletOut.java:54)
at 
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2919)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
at 
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at 
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at org.mortbay.http.HttpServer.service(HttpServer.java:954)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at 
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

2009-12-17 03:56:14,844 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: 
src: 172.29.2.67:50060, dest: 172.29.2.61:17116, bytes: 589824, op: 
MAPRED_SHUFFLE, cliID: attempt_200912102211_1648_m_82_0
2009-12-17 03:56:14,844 WARN /: 
/mapOutput?job=job_200912102211_1648map=attempt_200912102211_1648_m_82_0reduce=46:
 
java.lang.IllegalStateException: Committed
at 
org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
at 
org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
at 
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2945)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
at 
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at 
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at org.mortbay.http.HttpServer.service(HttpServer.java:954)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at 
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)


 reduce tasks stall and are eventually killed
 

 Key: MAPREDUCE-1308
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1308
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: 20-node cluster, 8 cores per machine, 32GB memory, 
 Fedora Linux
Reporter: Brian Karlak

 We recently migrated our 0.19.2 cluster from Gentoo Linux to Fedora Linux.  
 Everything was running smoothly before, but now about 5%-10% of our jobs have 
 at least one reduce task that stalls out and is eventually killed with the 
 message:
   Task attempt_200912102211_1648_r_09_0 failed to report status for 
 6003 seconds. Killing!

[jira] Commented: (MAPREDUCE-698) Per-pool task limits for the fair scheduler


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792013#action_12792013
 ] 

Matei Zaharia commented on MAPREDUCE-698:
-

I've looked at the patch more carefully, and it all looks good, except there 
seems to be a loop doing nothing in PoolManager:

{noformat}
+for(String pool : poolNamesInAllocFile) {
+}
{noformat}

I can remove this myself and commit the patch, unless there was a reason you 
had it there (and forgot to put in some code).

 Per-pool task limits for the fair scheduler
 ---

 Key: MAPREDUCE-698
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-698
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/fair-share
Reporter: Matei Zaharia
Assignee: Kevin Peterson
 Fix For: 0.21.0

 Attachments: MAPREDUCE-698-prelim.patch, mapreduce-698-trunk-3.patch, 
 mapreduce-698-trunk-4.patch, mapreduce-698-trunk.patch, 
 mapreduce-698-trunk.patch


 The fair scheduler could use a way to cap the share of a given pool similar 
 to MAPREDUCE-532.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-698) Per-pool task limits for the fair scheduler

2009-12-17 Thread Kevin Peterson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792018#action_12792018
 ] 

Kevin Peterson commented on MAPREDUCE-698:
--

That was for checking if the allocations were consistent (min  max), I moved 
this into the loop where they read but missed this bit it looks like.

 Per-pool task limits for the fair scheduler
 ---

 Key: MAPREDUCE-698
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-698
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/fair-share
Reporter: Matei Zaharia
Assignee: Kevin Peterson
 Fix For: 0.21.0

 Attachments: MAPREDUCE-698-prelim.patch, mapreduce-698-trunk-3.patch, 
 mapreduce-698-trunk-4.patch, mapreduce-698-trunk.patch, 
 mapreduce-698-trunk.patch


 The fair scheduler could use a way to cap the share of a given pool similar 
 to MAPREDUCE-532.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1305) Massive performance problem with DistCp and -delete

2009-12-17 Thread Peter Romianowski (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Romianowski updated MAPREDUCE-1305:
-

Attachment: (was: MAPRED-1305.patch)

 Massive performance problem with DistCp and -delete
 ---

 Key: MAPREDUCE-1305
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.20.1
Reporter: Peter Romianowski
Assignee: Peter Romianowski

 *First problem*
 In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus 
 objects when the path is all we need.
 The performance problem comes from 
 org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries 
 to retrieve file permissions by issuing a ls -ld path which is painfully 
 slow.
 Changed that to just serialize Path and not FileStatus.
 *Second problem*
 To delete the files we invoke the hadoop command line tool with option 
 -rmr path. Again, for each file.
 Changed that to dstfs.delete(path, true)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1305) Massive performance problem with DistCp and -delete

2009-12-17 Thread Peter Romianowski (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Romianowski updated MAPREDUCE-1305:
-

Attachment: MAPREDUCE-1305.patch

We even do not need the absolute path serialized. Using NullWritable now.

Patch is against trunk, rev 891812

 Massive performance problem with DistCp and -delete
 ---

 Key: MAPREDUCE-1305
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.20.1
Reporter: Peter Romianowski
Assignee: Peter Romianowski
 Attachments: MAPREDUCE-1305.patch


 *First problem*
 In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus 
 objects when the path is all we need.
 The performance problem comes from 
 org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries 
 to retrieve file permissions by issuing a ls -ld path which is painfully 
 slow.
 Changed that to just serialize Path and not FileStatus.
 *Second problem*
 To delete the files we invoke the hadoop command line tool with option 
 -rmr path. Again, for each file.
 Changed that to dstfs.delete(path, true)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-698) Per-pool task limits for the fair scheduler


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated MAPREDUCE-698:


Status: Open  (was: Patch Available)

 Per-pool task limits for the fair scheduler
 ---

 Key: MAPREDUCE-698
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-698
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/fair-share
Reporter: Matei Zaharia
Assignee: Kevin Peterson
 Fix For: 0.21.0

 Attachments: MAPREDUCE-698-prelim.patch, mapreduce-698-trunk-3.patch, 
 mapreduce-698-trunk-4.patch, mapreduce-698-trunk-5.patch, 
 mapreduce-698-trunk.patch, mapreduce-698-trunk.patch


 The fair scheduler could use a way to cap the share of a given pool similar 
 to MAPREDUCE-532.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-698) Per-pool task limits for the fair scheduler


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated MAPREDUCE-698:


Attachment: mapreduce-698-trunk-5.patch

Here's the patch with the for loop removed. I'm going to run it through Hudson 
for good measure, but it seems to be working fine from my point of view, and 
the test failures in the previous run were unrelated. I'll commit it unless 
Hudson complains.

 Per-pool task limits for the fair scheduler
 ---

 Key: MAPREDUCE-698
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-698
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/fair-share
Reporter: Matei Zaharia
Assignee: Kevin Peterson
 Fix For: 0.21.0

 Attachments: MAPREDUCE-698-prelim.patch, mapreduce-698-trunk-3.patch, 
 mapreduce-698-trunk-4.patch, mapreduce-698-trunk-5.patch, 
 mapreduce-698-trunk.patch, mapreduce-698-trunk.patch


 The fair scheduler could use a way to cap the share of a given pool similar 
 to MAPREDUCE-532.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-698) Per-pool task limits for the fair scheduler


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated MAPREDUCE-698:


Fix Version/s: (was: 0.21.0)
   0.22.0
 Hadoop Flags: [Reviewed]
   Status: Patch Available  (was: Open)

 Per-pool task limits for the fair scheduler
 ---

 Key: MAPREDUCE-698
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-698
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/fair-share
Reporter: Matei Zaharia
Assignee: Kevin Peterson
 Fix For: 0.22.0

 Attachments: MAPREDUCE-698-prelim.patch, mapreduce-698-trunk-3.patch, 
 mapreduce-698-trunk-4.patch, mapreduce-698-trunk-5.patch, 
 mapreduce-698-trunk.patch, mapreduce-698-trunk.patch


 The fair scheduler could use a way to cap the share of a given pool similar 
 to MAPREDUCE-532.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1146) Sqoop dependencies break Ecpilse build on Linux

2009-12-17 Thread Tom White (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1146:
-

   Resolution: Fixed
Fix Version/s: 0.22.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Aaron!

(There were actually no release audit warnings introduced by this patch. Also, 
the test failures were unrelated.)

 Sqoop dependencies break Ecpilse build on Linux
 ---

 Key: MAPREDUCE-1146
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1146
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
 Environment: Linux, Sun JDK6
Reporter: Konstantin Boudnik
Assignee: Aaron Kimball
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1146.2.patch, MAPREDUCE-1146.3.patch, 
 MAPREDUCE-1146.4.patch, MAPREDUCE-1146.patch


 Under  Linux there's the error in the Eclipse Problems view:
 {noformat}
 - com.sun.tools cannot be resolved at line 166 of  
 org.apache.hadoop.sqoop.orm.CompilationManager
 {noformat}
 The problem doesn't appear on MacOS though

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1174) Sqoop improperly handles table/column names which are reserved sql words


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792094#action_12792094
 ] 

Aaron Kimball commented on MAPREDUCE-1174:
--

Only test failures are unrelated (streaming).

 Sqoop improperly handles table/column names which are reserved sql words
 

 Key: MAPREDUCE-1174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1174
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1174.2.patch, MAPREDUCE-1174.3.patch, 
 MAPREDUCE-1174.4.patch, MAPREDUCE-1174.patch


 In some databases it is legal to name tables and columns with terms that 
 overlap SQL reserved keywords (e.g., {{CREATE}}, {{table}}, etc.). In such 
 cases, the database allows you to escape the table and column names. We 
 should always escape table and column names when possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1250) Refactor job token to use a common token interface

2009-12-17 Thread Kan Zhang (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792096#action_12792096
]

Kan Zhang commented on MAPREDUCE-1250:
--

I think it may make more sense to store a Token in the Task, especially since
it is Writable and can be easily serialized as part of the Task's write
method.

Currently, it only servers as a temporary in-memory cache for the SecretKey (to
avoid converting from tokenPassword to SecretKey each time the token is used
for Shuffle). The token itself is not intended to be serialized and sent along
with the Task object. The passing of credentials for a Task is handled by way
of the credential cache. If we're going to pass credentials along with Task
objects, we need to make sure Task objects are handled properly. Since this is
a re-factoring patch, I suggest we evaluate it as part of the credential cache
work Boris is doing.

Attaching a patch that addressed your other comments.

Refactor job token to use a common token interface
--

Key: MAPREDUCE-1250
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1250
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: security
Reporter: Kan Zhang
Assignee: Kan Zhang
Attachments: m1250-09.patch

The idea is to use a common token interface for both job token and delegation
token (HADOOP-6373) so that the RPC layer that uses them don't have to
differentiate them.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1250) Refactor job token to use a common token interface

2009-12-17 Thread Kan Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kan Zhang updated MAPREDUCE-1250:
-

Attachment: m1250-12.patch

 Refactor job token to use a common token interface
 --

 Key: MAPREDUCE-1250
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1250
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: security
Reporter: Kan Zhang
Assignee: Kan Zhang
 Attachments: m1250-09.patch, m1250-12.patch


 The idea is to use a common token interface for both job token and delegation 
 token (HADOOP-6373) so that the RPC layer that uses them don't have to 
 differentiate them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.

[
https://issues.apache.org/jira/browse/MAPREDUCE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792118#action_12792118
]

Hadoop QA commented on MAPREDUCE-1143:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12427899/MAPRED-1143-7.patch
against trunk revision 891524.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/console

This message is automatically generated.

runningMapTasks counter is not properly decremented in case of failed Tasks.

Key: MAPREDUCE-1143
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1143
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 0.20.1
Reporter: rahul k singh
Assignee: rahul k singh
Priority: Blocker
Fix For: 0.21.0

Attachments: MAPRED-1143-1.patch, MAPRED-1143-2.patch,
MAPRED-1143-2.patch, MAPRED-1143-3.patch, MAPRED-1143-4.patch,
MAPRED-1143-5.patch.txt, MAPRED-1143-6.patch, MAPRED-1143-7.patch,
MAPRED-1143-ydist-1.patch, MAPRED-1143-ydist-2.patch,
MAPRED-1143-ydist-3.patch, MAPRED-1143-ydist-4.patch,
MAPRED-1143-ydist-5.patch, MAPRED-1143-ydist-6.patch,
MAPRED-1143-ydist-7.patch, MAPRED-1143-ydist-8.patch.txt,
MAPRED-1143-ydist-9.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)

2009-12-17 Thread Scott Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-961:
-

Status: Open  (was: Patch Available)

 ResourceAwareLoadManager to dynamically decide new tasks based on current 
 CPU/memory load on TaskTracker(s)
 ---

 Key: MAPREDUCE-961
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-961
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/fair-share
Affects Versions: 0.22.0
Reporter: dhruba borthakur
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, 
 MAPREDUCE-961-v3.patch, MAPREDUCE-961-v4.patch, ResourceScheduling.pdf


 Design and develop a ResouceAwareLoadManager for the FairShare scheduler that 
 dynamically decides how many maps/reduces to run on a particular machine 
 based on the CPU/Memory/diskIO/network usage in that machine.  The amount of 
 resources currently used on each task tracker is being fed into the 
 ResourceAwareLoadManager in real-time via an entity that is external to 
 Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1174) Sqoop improperly handles table/column names which are reserved sql words

2009-12-17 Thread Tom White (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1174:
-

   Resolution: Fixed
Fix Version/s: 0.22.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Aaron!

 Sqoop improperly handles table/column names which are reserved sql words
 

 Key: MAPREDUCE-1174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1174
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1174.2.patch, MAPREDUCE-1174.3.patch, 
 MAPREDUCE-1174.4.patch, MAPREDUCE-1174.patch


 In some databases it is legal to name tables and columns with terms that 
 overlap SQL reserved keywords (e.g., {{CREATE}}, {{table}}, etc.). In such 
 cases, the database allows you to escape the table and column names. We 
 should always escape table and column names when possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously

2009-12-17 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1302:
--

Status: Patch Available  (was: Open)

 TrackerDistributedCacheManager can delete file asynchronously
 -

 Key: MAPREDUCE-1302
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch


 With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to 
 delete files from distributed cache asynchronously.
 That will help make task initialization faster, because task initialization 
 calls the code that localizes files into the cache and may delete some other 
 files.
 The deletion can slow down the task initialization speed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously

2009-12-17 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1302:
--

Attachment: MAPREDUCE-1302.1.patch

This patch is on top of MAPREDUCE-1213 which is already committed.


 TrackerDistributedCacheManager can delete file asynchronously
 -

 Key: MAPREDUCE-1302
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch


 With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to 
 delete files from distributed cache asynchronously.
 That will help make task initialization faster, because task initialization 
 calls the code that localizes files into the cache and may delete some other 
 files.
 The deletion can slow down the task initialization speed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously

2009-12-17 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1302:
--

Status: Open  (was: Patch Available)

 TrackerDistributedCacheManager can delete file asynchronously
 -

 Key: MAPREDUCE-1302
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch


 With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to 
 delete files from distributed cache asynchronously.
 That will help make task initialization faster, because task initialization 
 calls the code that localizes files into the cache and may delete some other 
 files.
 The deletion can slow down the task initialization speed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (MAPREDUCE-1235) java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball reassigned MAPREDUCE-1235:


Assignee: Aaron Kimball

 java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6 
 to TIMESTAMP. 
 

 Key: MAPREDUCE-1235
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1235
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Affects Versions: 0.20.1
 Environment: hadoop 0.20.1
 sqoop
 ubuntu karmic
 mysql 4
Reporter: valentina kroshilina
Assignee: Aaron Kimball
Priority: Minor
   Original Estimate: 4h
  Remaining Estimate: 4h

 *Description*: java.io.IOException is thrown when trying to import a table to 
 HDFS using Sqoop. Table has 0 value in a field of type datetime. 
 *Full Exception*: java.io.IOException: Cannot convert value '-00-00 
 00:00:00' from column 6 to TIMESTAMP. 
 *Original question*: 
 http://getsatisfaction.com/cloudera/topics/cant_import_table?utm_content=reply_linkutm_medium=emailutm_source=reply_notification

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1295) We need a job trace manipulator to build gridmix runs.

2009-12-17 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1295:
-

Status: Patch Available  (was: Open)

This patch fixes an applicability issue.

 We need a job trace manipulator to build gridmix runs.
 --

 Key: MAPREDUCE-1295
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1295
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Dick King
Assignee: Dick King
 Attachments: mapreduce-1295--2009-12-17.patch, 
 mapreduce-1297--2009-12-14.patch


 Rumen produces job traces, which are JSON format files describing important 
 aspects of all jobs that are run [successfully or not] on a hadoop map/reduce 
 cluster.  There are two packages under development that will consume these 
 trace files and produce actions in that cluster or another cluster: gridmix3 
 [see jira MAPREDUCE-1124 ] and Mumak [a simulator -- see MAPREDUCE-728 ].
 It would be useful to be able to do two things with job traces, so we can run 
 experiments using these two tools: change the duration, and change the 
 density.  I would like to provide a folder, a tool that can wrap a 
 long-duration execution trace to redistribute its jobs over a shorter 
 interval, and also change the density by duplicating or culling away jobs 
 from the folded combined job trace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1235) java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1235:
-

Status: Patch Available  (was: Open)

 java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6 
 to TIMESTAMP. 
 

 Key: MAPREDUCE-1235
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1235
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Affects Versions: 0.20.1
 Environment: hadoop 0.20.1
 sqoop
 ubuntu karmic
 mysql 4
Reporter: valentina kroshilina
Assignee: Aaron Kimball
Priority: Minor
 Attachments: MAPREDUCE-1235.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 *Description*: java.io.IOException is thrown when trying to import a table to 
 HDFS using Sqoop. Table has 0 value in a field of type datetime. 
 *Full Exception*: java.io.IOException: Cannot convert value '-00-00 
 00:00:00' from column 6 to TIMESTAMP. 
 *Original question*: 
 http://getsatisfaction.com/cloudera/topics/cant_import_table?utm_content=reply_linkutm_medium=emailutm_source=reply_notification

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1235) java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.

[
https://issues.apache.org/jira/browse/MAPREDUCE-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron Kimball updated MAPREDUCE-1235:
-

Attachment: MAPREDUCE-1235.patch

Attaching patch to fix this issue. MySQL supports TIMESTAMP values of
'-00-00 00:00:00' which is out-of-range for java.sql.Timestamp. MySQL
allows various behaviors for handling this; the default used to be to convert
this value to null; since MySQL 5 it now throws IOException when such a
timestamp is retrieved.

Sqoop now sets the default behavior to be convert these values to 'null', since
this is a reasonable data conversion given the imprecision available. Users can
override this default by passing the {{zeroDateTimeBehavior=exception}}
parameter in the connect string.

java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6
to TIMESTAMP.

Key: MAPREDUCE-1235
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1235
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/sqoop
Affects Versions: 0.20.1
Environment: hadoop 0.20.1
sqoop
ubuntu karmic
mysql 4
Reporter: valentina kroshilina
Assignee: Aaron Kimball
Priority: Minor
Attachments: MAPREDUCE-1235.patch

Original Estimate: 4h
Remaining Estimate: 4h

*Description*: java.io.IOException is thrown when trying to import a table to
HDFS using Sqoop. Table has 0 value in a field of type datetime.
*Full Exception*: java.io.IOException: Cannot convert value '-00-00
00:00:00' from column 6 to TIMESTAMP.
*Original question*:
http://getsatisfaction.com/cloudera/topics/cant_import_table?utm_content=reply_linkutm_medium=emailutm_source=reply_notification

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1295) We need a job trace manipulator to build gridmix runs.

2009-12-17 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1295:
-

Attachment: mapreduce-1295--2009-12-17.patch

This patch applies on a direct download of Trunk, and replaces the previous 
patch

 We need a job trace manipulator to build gridmix runs.
 --

 Key: MAPREDUCE-1295
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1295
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Dick King
Assignee: Dick King
 Attachments: mapreduce-1295--2009-12-17.patch, 
 mapreduce-1297--2009-12-14.patch


 Rumen produces job traces, which are JSON format files describing important 
 aspects of all jobs that are run [successfully or not] on a hadoop map/reduce 
 cluster.  There are two packages under development that will consume these 
 trace files and produce actions in that cluster or another cluster: gridmix3 
 [see jira MAPREDUCE-1124 ] and Mumak [a simulator -- see MAPREDUCE-728 ].
 It would be useful to be able to do two things with job traces, so we can run 
 experiments using these two tools: change the duration, and change the 
 density.  I would like to provide a folder, a tool that can wrap a 
 long-duration execution trace to redistribute its jobs over a shorter 
 interval, and also change the density by duplicating or culling away jobs 
 from the folded combined job trace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1235) java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.

2009-12-17 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792176#action_12792176
 ] 

Todd Lipcon commented on MAPREDUCE-1235:


patch looks good to me

 java.io.IOException: Cannot convert value '-00-00 00:00:00' from column 6 
 to TIMESTAMP. 
 

 Key: MAPREDUCE-1235
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1235
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Affects Versions: 0.20.1
 Environment: hadoop 0.20.1
 sqoop
 ubuntu karmic
 mysql 4
Reporter: valentina kroshilina
Assignee: Aaron Kimball
Priority: Minor
 Attachments: MAPREDUCE-1235.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 *Description*: java.io.IOException is thrown when trying to import a table to 
 HDFS using Sqoop. Table has 0 value in a field of type datetime. 
 *Full Exception*: java.io.IOException: Cannot convert value '-00-00 
 00:00:00' from column 6 to TIMESTAMP. 
 *Original question*: 
 http://getsatisfaction.com/cloudera/topics/cant_import_table?utm_content=reply_linkutm_medium=emailutm_source=reply_notification

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1298) better access/organization of userlogs

2009-12-17 Thread Meng Mao (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Meng Mao updated MAPREDUCE-1298:

Attachment: fido.py

Attached is a script that illustrates a typical debugging approach. The script
goes out to all the worker nodes and grabs any userlogs for attempts for a
given job.

If there were a page that brought all these userlogs together for a given job,
this script wouldn't be necessary.

better access/organization of userlogs
--

Key: MAPREDUCE-1298
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1298
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: tasktracker
Reporter: Meng Mao
Priority: Minor
Attachments: fido.py

Right now, it is quite a chore to browse to all userlogs generated during a
given map or reduce phase.
It is quite easy to browse to a job and look at either the map or reduce
tasks, like so:
/jobtasks.jsp?jobid=job_myidtype=mappagenum=1
/jobtasks.jsp?jobid=job_myidtype=reducepagenum=1
However, it is not easy to look at the stderr output across all the attempts.
Currently, the best technique I know of is to browse into each task:
/taskdetails.jsp?jobid=job_myidtipid=task_taskid
And from there, jump to the slave node's task log for that taskid:
slavenode/tasklog?taskid=attempt_for the taskidall=true
I'm not suggesting that there needs to be really sophisticated way to present
all the task userlogs in one place, especially with the expected size of the
logs. However, it would be nice to be presented with a list of URLs (that are
clickable) to all the log files. From here, it would be easy to copy/paste
that elsewhere, where I could wget the set of log files and grep through
them. What has prevented me from scripting it is a foolproof way to branch
down from a job id to all the constituent task ids and logs.
One more thing -- the task detail page:
/taskdetails.jsp?jobid=job_myidtipid=task_taskid
gives links to see 4kb, 8kb, and all logs. I think it'd be nice to be able to
get a link to just the stdout, stderr, and syslog portions. Most of our
debugging is done by examining all of the stderr logs. Maybe it's possible to
request that via URL? But I haven't found out how to in documentation.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1083) Use the user-to-groups mapping service in the JobTracker

2009-12-17 Thread Boris Shkolnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated MAPREDUCE-1083:
--

Attachment: MAPREDUCE-1083-2.patch

added and fixed tests to support common changes

  Use the user-to-groups mapping service in the JobTracker
 -

 Key: MAPREDUCE-1083
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1083
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Arun C Murthy
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: HADOOP-4656_mr.patch, MAPREDUCE-1083-2.patch


 HADOOP-4656 introduces a user-to-groups mapping service on the server-side. 
 The JobTracker should use this to map users to their groups rather than 
 relying on the information passed by the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2009-12-17 Thread Dick King (JIRA)

I want to change the rumen job trace generator to use a more modular internal 
structure, to allow for more input log formats 
-

 Key: MAPREDUCE-1309
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Dick King


There are two orthogonal questions to answer when processing a job tracker log: 
how will the logs and the xml configuration files be packaged, and in which 
release of hadoop map/reduce were the logs generated?  The existing rumen only 
has a couple of answers to this question.  The new engine will handle three 
answers to the version question: 0.18, 0.20 and current, and two answers to the 
packaging question: separate files with names derived from the job ID, and 
concatenated files with a header between sections [used for easier file 
interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-181) Secure job submission


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-181:
--

Status: Open  (was: Patch Available)

 Secure job submission 
 --

 Key: MAPREDUCE-181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amar Kamat
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: 181-1.patch, 181-2.patch, 181-3.patch, 181-3.patch, 
 181-4.patch, 181-5.1.patch, hadoop-3578-branch-20-example-2.patch, 
 hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, 
 HADOOP-3578-v2.7.patch, MAPRED-181-v3.32.patch, MAPRED-181-v3.8.patch


 Currently the jobclient accesses the {{mapred.system.dir}} to add job 
 details. Hence the {{mapred.system.dir}} has the permissions of 
 {{rwx-wx-wx}}. This could be a security loophole where the job files might 
 get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-181) Secure job submission

[
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Devaraj Das updated MAPREDUCE-181:
--

Attachment: 181-5.1.patch

Thanks for the review, Owen. This patch addresses the concerns. I also did one
more change - the JobInProgress constructor now checks whether the username in
the submitted jobconf is the same as the one obtained from the UGI, and if not,
fails the job submission. Ideally, we should not use conf.getUser anywhere but
since it is used even in the TaskTracker code, i left it as it is but instead
fail the job submission if the user string from the two sources don't match..

Secure job submission
--

Key: MAPREDUCE-181
URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
Project: Hadoop Map/Reduce
Issue Type: Sub-task
Reporter: Amar Kamat
Assignee: Devaraj Das
Fix For: 0.22.0

Attachments: 181-1.patch, 181-2.patch, 181-3.patch, 181-3.patch,
181-4.patch, 181-5.1.patch, hadoop-3578-branch-20-example-2.patch,
hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch,
HADOOP-3578-v2.7.patch, MAPRED-181-v3.32.patch, MAPRED-181-v3.8.patch

Currently the jobclient accesses the {{mapred.system.dir}} to add job
details. Hence the {{mapred.system.dir}} has the permissions of
{{rwx-wx-wx}}. This could be a security loophole where the job files might
get overwritten/tampered after the job submission.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-181) Secure job submission


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-181:
--

Status: Patch Available  (was: Open)

 Secure job submission 
 --

 Key: MAPREDUCE-181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amar Kamat
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: 181-1.patch, 181-2.patch, 181-3.patch, 181-3.patch, 
 181-4.patch, 181-5.1.patch, hadoop-3578-branch-20-example-2.patch, 
 hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, 
 HADOOP-3578-v2.7.patch, MAPRED-181-v3.32.patch, MAPRED-181-v3.8.patch


 Currently the jobclient accesses the {{mapred.system.dir}} to add job 
 details. Hence the {{mapred.system.dir}} has the permissions of 
 {{rwx-wx-wx}}. This could be a security loophole where the job files might 
 get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1067) Default state of queues is undefined when unspecified

[
https://issues.apache.org/jira/browse/MAPREDUCE-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792242#action_12792242
]

Hadoop QA commented on MAPREDUCE-1067:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12428275/MAPREDUCE-1067-6.patch
against trunk revision 891823.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/212/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/212/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/212/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/212/console

This message is automatically generated.

Default state of queues is undefined when unspecified
-

Key: MAPREDUCE-1067
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1067
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: jobtracker
Affects Versions: 0.21.0
Reporter: V.V.Chaitanya Krishna
Assignee: V.V.Chaitanya Krishna
Priority: Blocker
Fix For: 0.21.0

Attachments: MAPREDUCE-1067-1.patch, MAPREDUCE-1067-2.patch,
MAPREDUCE-1067-3.patch, MAPREDUCE-1067-4.patch, MAPREDUCE-1067-5.patch,
MAPREDUCE-1067-6.patch

Currently, if the state of a queue is not specified, it is being set to
undefined state instead of running state.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-181) Secure job submission


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-181:
--

Attachment: 181-5.1.patch

Sorry, the last patch had a silly bug in the new checks i introduced.

 Secure job submission 
 --

 Key: MAPREDUCE-181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amar Kamat
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: 181-1.patch, 181-2.patch, 181-3.patch, 181-3.patch, 
 181-4.patch, 181-5.1.patch, 181-5.1.patch, 
 hadoop-3578-branch-20-example-2.patch, hadoop-3578-branch-20-example.patch, 
 HADOOP-3578-v2.6.patch, HADOOP-3578-v2.7.patch, MAPRED-181-v3.32.patch, 
 MAPRED-181-v3.8.patch


 Currently the jobclient accesses the {{mapred.system.dir}} to add job 
 details. Hence the {{mapred.system.dir}} has the permissions of 
 {{rwx-wx-wx}}. This could be a security loophole where the job files might 
 get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1083) Use the user-to-groups mapping service in the JobTracker


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-1083:
---

Status: Patch Available  (was: Open)

Submitting the patch on behalf of Boris.

  Use the user-to-groups mapping service in the JobTracker
 -

 Key: MAPREDUCE-1083
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1083
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Arun C Murthy
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: HADOOP-4656_mr.patch, MAPREDUCE-1083-2.patch, 
 MAPREDUCE-1083-3.patch


 HADOOP-4656 introduces a user-to-groups mapping service on the server-side. 
 The JobTracker should use this to map users to their groups rather than 
 relying on the information passed by the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1311) TestStreamingExitStatus fails on hudson patch builds


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792330#action_12792330
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1311:


The failure log for one of the builds is @
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/203/testReport/org.apache.hadoop.streaming/TestStreamingExitStatus/testMapFailOk/

 TestStreamingExitStatus fails on hudson patch builds
 

 Key: MAPREDUCE-1311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Amareshwari Sriramadasu

 TestStreamingExitStatus fails on hudson patch builds. The logs have the 
 following error :
 {noformat}
 09/12/16 20:30:58 INFO fs.FSInputChecker: Found checksum error: b[0, 
 6]=68656c6c6f0a
 org.apache.hadoop.fs.ChecksumException: Checksum error: 
 file:/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h3.grid.sp2.yahoo.net/trunk/build/contrib/streaming/test/data/input.txt
  at 0
   at 
 org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:278)
   at 
 org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:242)
   at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:190)
   at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
   at java.io.DataInputStream.read(DataInputStream.java:83)
   at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
   at 
 org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:180)
   at 
 org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45)
   at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:206)
   at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:191)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
   at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:376)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:257)
 09/12/16 20:30:58 INFO streaming.PipeMapRed: MRErrorThread done
 {noformat}
 The same passes on my local machine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1312) TestStreamingKeyValue fails on hudson patch builds


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792336#action_12792336
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1312:


The same passes on my local machine.

 TestStreamingKeyValue fails on hudson patch builds
 --

 Key: MAPREDUCE-1312
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1312
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build, test
Reporter: Amareshwari Sriramadasu

 TestStreamingKeyValue fails on hudson patch builds with FileNotFoundException.
 The failure log from one of the builds is @ 
 http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/203/testReport/org.apache.hadoop.streaming/TestStreamingKeyValue/testCommandLine/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1009) Forrest documentation needs to be updated to describes features provided for supporting hierarchical queues

2009-12-17 Thread Hemanth Yamijala (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated MAPREDUCE-1009:


Attachment: MAPREDUCE-1009-20091217.txt

I am attaching a new patch that makes some modifications:
- Added a new file build-utils.xml that moves the java5.check and forrest.check 
targets. Imported this into the main build.xml and build-contrib.xml, thereby 
removing duplication of these targets in the earlier patch.
- I reorganized and edited the section on mapred-queues.xml in cluster-setup 
documentation. Primarily, I tried to make the connection between the queues and 
schedulers more explicit. I also tried to classify various queue configurations 
a little more clearly - like single queue setup, multiple single level queue 
setup and hierarchical queue setup, giving descriptions of each.
- Some other editorial changes - like scrubbing the example of hierarchical 
queue setup in mapred-queues.xml.template.

Vinod, can you quickly glance at these differences and see if you are 
comfortable with these ?

 Forrest documentation needs to be updated to describes features provided for 
 supporting hierarchical queues
 ---

 Key: MAPREDUCE-1009
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1009
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.21.0
Reporter: Hemanth Yamijala
Assignee: Vinod K V
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1009-20091008.txt, 
 MAPREDUCE-1009-20091116.txt, MAPREDUCE-1009-20091124.txt, 
 MAPREDUCE-1009-20091211.txt, MAPREDUCE-1009-20091217.txt


 Forrest documentation must be updated for describing how to set up and use 
 hierarchical queues in the framework and the capacity scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1312) TestStreamingKeyValue fails on hudson patch builds

TestStreamingKeyValue fails on hudson patch builds
--

 Key: MAPREDUCE-1312
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1312
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build, test
Reporter: Amareshwari Sriramadasu


TestStreamingKeyValue fails on hudson patch builds with FileNotFoundException.
The failure log from one of the builds is @ 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/203/testReport/org.apache.hadoop.streaming/TestStreamingKeyValue/testCommandLine/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1313) NPE in FieldFormatter if escape character is set and field is null


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1313:
-

Attachment: MAPREDUCE-1313.patch

Patch to fix this issue.

 NPE in FieldFormatter if escape character is set and field is null
 --

 Key: MAPREDUCE-1313
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1313
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1313.patch


 Performing an import with the {{\-\-escaped-by}} character set on a table 
 with a null field will cause a NullPointerException in FieldFormatter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1313) NPE in FieldFormatter if escape character is set and field is null

NPE in FieldFormatter if escape character is set and field is null
--

 Key: MAPREDUCE-1313
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1313
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1313.patch

Performing an import with the {{\-\-escaped-by}} character set on a table with 
a null field will cause a NullPointerException in FieldFormatter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1313) NPE in FieldFormatter if escape character is set and field is null


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1313:
-

Status: Patch Available  (was: Open)

 NPE in FieldFormatter if escape character is set and field is null
 --

 Key: MAPREDUCE-1313
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1313
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1313.patch


 Performing an import with the {{\-\-escaped-by}} character set on a table 
 with a null field will cause a NullPointerException in FieldFormatter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously

[
https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792364#action_12792364
]

Hadoop QA commented on MAPREDUCE-1302:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12428349/MAPREDUCE-1302.1.patch
against trunk revision 891920.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/215/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/215/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/215/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/215/console

This message is automatically generated.

TrackerDistributedCacheManager can delete file asynchronously
-

Key: MAPREDUCE-1302
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: tasktracker
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Zheng Shao
Assignee: Zheng Shao
Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch

With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to
delete files from distributed cache asynchronously.
That will help make task initialization faster, because task initialization
calls the code that localizes files into the cache and may delete some other
files.
The deletion can slow down the task initialization speed.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1258) Fair scheduler event log not logging job info


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated MAPREDUCE-1258:
-

Status: Patch Available  (was: Open)

 Fair scheduler event log not logging job info
 -

 Key: MAPREDUCE-1258
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1258
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/fair-share
Affects Versions: 0.21.0
Reporter: Matei Zaharia
Assignee: Matei Zaharia
Priority: Minor
 Attachments: mapreduce-1258-1.patch


 The MAPREDUCE-706 patch seems to have left an unfinished TODO in the Fair 
 Scheduler - namely, in the dump() function for periodically dumping scheduler 
 state to the event log, the part that dumps information about jobs is 
 commented out. This makes the event log less useful than it was before.
 It should be fairly easy to update this part to use the new scheduler data 
 structures (Schedulable etc) and print the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1258) Fair scheduler event log not logging job info