[jira] Created: (HADOOP-2647) dfs -put hangs
dfs -put hangs -- Key: HADOOP-2647 URL: https://issues.apache.org/jira/browse/HADOOP-2647 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.15.1 Environment: LINUX Reporter: lohit vijayarenu We saw a case where dfs -put hung while copying a 2GB file for over 20 hours. When we took a look at the stack trace of process the main thread was waiting for confirmation from namenode for complete status. only 4 blocks were copied and 5th block was missing when we ran fsck on the partially transfered file. There are 2 problems we saw here. 1. DFS client hung without a timeout when there is no response from namenode. 2. In IOUtils::copyBytes(InputStream in, OutputStream out, int buffSize, boolean close) During copy, if there is an exception, the out.close() is called. Exception is not caught. Which is why we see a close call in the stack trace. When we checked for block IDs in namenode log. For the block which was missing, there was only one response to namenode instead of three. This close state coupled with namenode not reporting the error back might have cause the whole process to hang. Opening this JIRA to see if we could add checks to the two problems mentioned above. "main" prio=10 tid=0x0805a000 nid=0x5b53 waiting on condition [0xf7e64000..0xf7e65288] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1751) - locked <0x77d593a0> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826) at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:114) at org.apache.hadoop.fs.FsShell.run(FsShell.java:1354) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1472) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227
[ https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558344#action_12558344 ] lohit vijayarenu commented on HADOOP-2570: -- i checked out trunk, applied this patch and ran 'ant test' apart from these twoorg.apache.hadoop.hbase.TestMergeMeta org.apache.hadoop.hbase.TestMergeTable all test passed. > streaming jobs fail after HADOOP-2227 > - > > Key: HADOOP-2570 > URL: https://issues.apache.org/jira/browse/HADOOP-2570 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.15.2 >Reporter: lohit vijayarenu >Assignee: Amareshwari Sri Ramadasu >Priority: Blocker > Fix For: 0.15.3 > > Attachments: HADOOP-2570_1_20080112.patch, patch-2570.txt > > > HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed > like this > {code} > File jobCacheDir = new File(currentDir.getParentFile().getParent(), "work"); > {code} > We should change this to get it working. Referring to the changes made in > HADOOP-2227, I see that the APIs used in there to construct the path are not > public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists
[ https://issues.apache.org/jira/browse/HADOOP-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2582: - Status: Patch Available (was: Open) thanks raghu. making it PA > hadoop dfs -copyToLocal creates zero byte files, when source file does not > exists > -- > > Key: HADOOP-2582 > URL: https://issues.apache.org/jira/browse/HADOOP-2582 > Project: Hadoop > Issue Type: Bug > Components: dfs >Affects Versions: 0.15.2 >Reporter: lohit vijayarenu > Attachments: HADOOP_2582_1.patch, HADOOP_2582_2.patch > > > hadoop dfs -copyToLocal with an no existing source file creates a zero byte > destination file. It should throw an error message indicating the source file > does not exists. > {noformat} > [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile > [lohit@ hadoop-trunk]$ ls -l nosuchfile > -rw-r--r-- 1 lohit users 0 Jan 11 21:58 nosuchfile > [lohit@ hadoop-trunk]$ > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists
[ https://issues.apache.org/jira/browse/HADOOP-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2582: - Attachment: HADOOP_2582_2.patch Thanks Raghu, I have attached another patch which fixes FileUtil. Now we catch both -get and -put errors. > hadoop dfs -copyToLocal creates zero byte files, when source file does not > exists > -- > > Key: HADOOP-2582 > URL: https://issues.apache.org/jira/browse/HADOOP-2582 > Project: Hadoop > Issue Type: Bug > Components: dfs >Affects Versions: 0.15.2 >Reporter: lohit vijayarenu > Attachments: HADOOP_2582_1.patch, HADOOP_2582_2.patch > > > hadoop dfs -copyToLocal with an no existing source file creates a zero byte > destination file. It should throw an error message indicating the source file > does not exists. > {noformat} > [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile > [lohit@ hadoop-trunk]$ ls -l nosuchfile > -rw-r--r-- 1 lohit users 0 Jan 11 21:58 nosuchfile > [lohit@ hadoop-trunk]$ > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists
[ https://issues.apache.org/jira/browse/HADOOP-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2582: - Attachment: HADOOP_2582_1.patch Attached is a simple patch which checks for existence of source before initiating copy. Have updated TestDFSShell test case to check for this condition as well. > hadoop dfs -copyToLocal creates zero byte files, when source file does not > exists > -- > > Key: HADOOP-2582 > URL: https://issues.apache.org/jira/browse/HADOOP-2582 > Project: Hadoop > Issue Type: Bug > Components: dfs >Affects Versions: 0.15.2 >Reporter: lohit vijayarenu > Attachments: HADOOP_2582_1.patch > > > hadoop dfs -copyToLocal with an no existing source file creates a zero byte > destination file. It should throw an error message indicating the source file > does not exists. > {noformat} > [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile > [lohit@ hadoop-trunk]$ ls -l nosuchfile > -rw-r--r-- 1 lohit users 0 Jan 11 21:58 nosuchfile > [lohit@ hadoop-trunk]$ > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227
[ https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558159#action_12558159 ] lohit vijayarenu commented on HADOOP-2570: -- testing the streaming job again. This patch solves the problem seen earlier. Thanks! > streaming jobs fail after HADOOP-2227 > - > > Key: HADOOP-2570 > URL: https://issues.apache.org/jira/browse/HADOOP-2570 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.15.2 >Reporter: lohit vijayarenu >Assignee: Amareshwari Sri Ramadasu >Priority: Blocker > Fix For: 0.15.3 > > Attachments: HADOOP-2570_1_20080112.patch, patch-2570.txt > > > HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed > like this > {code} > File jobCacheDir = new File(currentDir.getParentFile().getParent(), "work"); > {code} > We should change this to get it working. Referring to the changes made in > HADOOP-2227, I see that the APIs used in there to construct the path are not > public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists
hadoop dfs -copyToLocal creates zero byte files, when source file does not exists -- Key: HADOOP-2582 URL: https://issues.apache.org/jira/browse/HADOOP-2582 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.15.2 Reporter: lohit vijayarenu hadoop dfs -copyToLocal with an no existing source file creates a zero byte destination file. It should throw an error message indicating the source file does not exists. {noformat} [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile [lohit@ hadoop-trunk]$ ls -l nosuchfile -rw-r--r-- 1 lohit users 0 Jan 11 21:58 nosuchfile [lohit@ hadoop-trunk]$ {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227
[ https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557909#action_12557909 ] lohit vijayarenu commented on HADOOP-2570: -- I tested this patch and it works for my earlier failing streaming job. Thanks! > streaming jobs fail after HADOOP-2227 > - > > Key: HADOOP-2570 > URL: https://issues.apache.org/jira/browse/HADOOP-2570 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.15.2 >Reporter: lohit vijayarenu >Assignee: Amareshwari Sri Ramadasu >Priority: Blocker > Fix For: 0.15.3 > > Attachments: patch-2570.txt > > > HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed > like this > {code} > File jobCacheDir = new File(currentDir.getParentFile().getParent(), "work"); > {code} > We should change this to get it working. Referring to the changes made in > HADOOP-2227, I see that the APIs used in there to construct the path are not > public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks
[ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557807#action_12557807 ] lohit vijayarenu commented on HADOOP-2116: -- Hi Amareshwari, I tested this patch against trunk for resolution of HADOOP-2570. This solves the problem mentioned in HADOOP-2570. Should this patch be marked to go in 0.15.3 ? Thanks, Lohit > Job.local.dir to be exposed to tasks > > > Key: HADOOP-2116 > URL: https://issues.apache.org/jira/browse/HADOOP-2116 > Project: Hadoop > Issue Type: Improvement > Components: mapred >Affects Versions: 0.14.3 > Environment: All >Reporter: Milind Bhandarkar >Assignee: Amareshwari Sri Ramadasu > Fix For: 0.16.0 > > Attachments: patch-2116.txt, patch-2116.txt > > > Currently, since all task cwds are created under a jobcache directory, users > that need a job-specific shared directory for use as scratch space, create > ../work. This is hacky, and will break when HADOOP-2115 is addressed. For > such jobs, hadoop mapred should expose job.local.dir via localized > configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227
[ https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557695#action_12557695 ] lohit vijayarenu commented on HADOOP-2570: -- the 2 places where jobcache dir was used in streaming was to 'chmod' the executable and to lookup this directory in PATH. Would be it OK to construct jobCacheDir as done in HADOOP-2227 ? > streaming jobs fail after HADOOP-2227 > - > > Key: HADOOP-2570 > URL: https://issues.apache.org/jira/browse/HADOOP-2570 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.15.2 >Reporter: lohit vijayarenu >Assignee: Amareshwari Sri Ramadasu > Fix For: 0.15.3 > > > HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed > like this > {code} > File jobCacheDir = new File(currentDir.getParentFile().getParent(), "work"); > {code} > We should change this to get it working. Referring to the changes made in > HADOOP-2227, I see that the APIs used in there to construct the path are not > public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2570) streaming jobs fail after HADOOP-2227
streaming jobs fail after HADOOP-2227 - Key: HADOOP-2570 URL: https://issues.apache.org/jira/browse/HADOOP-2570 Project: Hadoop Issue Type: Bug Components: contrib/streaming Affects Versions: 0.15.2 Reporter: lohit vijayarenu Fix For: 0.15.3 HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed like this {code} File jobCacheDir = new File(currentDir.getParentFile().getParent(), "work"); {code} We should change this to get it working. Referring to the changes made in HADOOP-2227, I see that the APIs used in there to construct the path are not public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2427) Cleanup of mapred.local.dir after maptask is complete
Cleanup of mapred.local.dir after maptask is complete - Key: HADOOP-2427 URL: https://issues.apache.org/jira/browse/HADOOP-2427 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.15.1 Reporter: lohit vijayarenu I see that after a map task is complete, its working directory (mapred.local.dir)/taskTracker/jobcache// is not deleted untill the job is complete. If map out files are stored in there, could this be created in different directory and the working directory cleaned up after map task is complete. One problem we are seeing is, if a map task creates files temporary files, they get accumulated and we may run out of disk space thus failing the job. Relying on the user to cleanup all temp files created is be error prone. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-2361) hadoop version wrong in 0.15.1
[ https://issues.apache.org/jira/browse/HADOOP-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu resolved HADOOP-2361. -- Resolution: Invalid > hadoop version wrong in 0.15.1 > -- > > Key: HADOOP-2361 > URL: https://issues.apache.org/jira/browse/HADOOP-2361 > Project: Hadoop > Issue Type: Bug > Components: build >Affects Versions: 0.15.1 >Reporter: lohit vijayarenu > > I downloaded 0.15.1 release, recompiled and executed ./bin/hadoop version. It > says 0.15.2-dev picking it from build.xml -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2361) hadoop version wrong in 0.15.1
[ https://issues.apache.org/jira/browse/HADOOP-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548875 ] lohit vijayarenu commented on HADOOP-2361: -- my bad, looks like i picked 0.15 branch instead of tag. closing this as invalid > hadoop version wrong in 0.15.1 > -- > > Key: HADOOP-2361 > URL: https://issues.apache.org/jira/browse/HADOOP-2361 > Project: Hadoop > Issue Type: Bug > Components: build >Affects Versions: 0.15.1 >Reporter: lohit vijayarenu > > I downloaded 0.15.1 release, recompiled and executed ./bin/hadoop version. It > says 0.15.2-dev picking it from build.xml -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2361) hadoop version wrong in 0.15.1
hadoop version wrong in 0.15.1 -- Key: HADOOP-2361 URL: https://issues.apache.org/jira/browse/HADOOP-2361 Project: Hadoop Issue Type: Bug Components: build Affects Versions: 0.15.1 Reporter: lohit vijayarenu I downloaded 0.15.1 release, recompiled and executed ./bin/hadoop version. It says 0.15.2-dev picking it from build.xml -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2302) Streaming should provide an option for numerical sort of keys
[ https://issues.apache.org/jira/browse/HADOOP-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2302: - Issue Type: Improvement (was: Bug) > Streaming should provide an option for numerical sort of keys > -- > > Key: HADOOP-2302 > URL: https://issues.apache.org/jira/browse/HADOOP-2302 > Project: Hadoop > Issue Type: Improvement > Components: contrib/streaming >Reporter: lohit vijayarenu > > It would be good to have an option for numerical sort of keys for streaming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2302) Streaming should provide an option for numerical sort of keys
Streaming should provide an option for numerical sort of keys -- Key: HADOOP-2302 URL: https://issues.apache.org/jira/browse/HADOOP-2302 Project: Hadoop Issue Type: Bug Components: contrib/streaming Reporter: lohit vijayarenu It would be good to have an option for numerical sort of keys for streaming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2193) dfs rm and rmr commands differ from POSIX standards
[ https://issues.apache.org/jira/browse/HADOOP-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2193: - Attachment: HADOOP-2193-1.patch Attached is a simple patch. Since fs.delete does not propagate FileNotFoundException, I have added a fs.exists() check even before we try to move to trash and attempt delete. updated testcase to verify this. > dfs rm and rmr commands differ from POSIX standards > --- > > Key: HADOOP-2193 > URL: https://issues.apache.org/jira/browse/HADOOP-2193 > Project: Hadoop > Issue Type: Bug > Components: dfs >Affects Versions: 0.15.1, 0.16.0 >Reporter: Mukund Madhugiri > Fix For: 0.16.0 > > Attachments: HADOOP-2193-1.patch > > > Assuming the dfs commands follow POSIX standards, there are some problems > with the DFS rm and rmr commands. I compared the DFS output with that of RHEL > 4u5: > In both cases, if the file/directory does not exist, it will not give any > indication to the user. > 1. rm a file/directory that does not exist: > Linux: rm: cannot remove `testarea/two': No such file or directory > DFS: rm: /testarea/two > 2. rmr a file/directory that does not exist: > Linux: rm: cannot remove `testarea/two': No such file or directory > DFS: rm: /testarea/two -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2190) dfs ls and lsr commands differ from POSIX standards
[ https://issues.apache.org/jira/browse/HADOOP-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2190: - Attachment: HADOOP-2190-2.patch Another patch, which now calls listStatus instead of deprecated listPaths(Path) . updated test case and changes suggested by Mukund. > dfs ls and lsr commands differ from POSIX standards > --- > > Key: HADOOP-2190 > URL: https://issues.apache.org/jira/browse/HADOOP-2190 > Project: Hadoop > Issue Type: Bug > Components: dfs >Affects Versions: 0.15.1, 0.16.0 >Reporter: Mukund Madhugiri > Fix For: 0.16.0 > > Attachments: HADOOP-2190-1.patch, HADOOP-2190-2.patch > > > Assuming the dfs commands follow POSIX standards, there are some problems > with the DFS ls and lsr commands. I compared the DFS output with that of > RHEL 4u5 > 1. ls a directory when there are no files/directories in that directory: > Linux: No output > DFS: Found 0 items > 2. ls a file/directory that does not exist: > Linux: ls: /doesnotexist: No such file or directory > DFS: Found 0 items > 3. lsr a directory that does not exist: > Linux: ls: /doesnotexist: No such file or directory > DFS: No output -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2190) dfs ls and lsr commands differ from POSIX standards
[ https://issues.apache.org/jira/browse/HADOOP-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542979 ] lohit vijayarenu commented on HADOOP-2190: -- sure, if patch is approved I will make these changes while adding testcase. > dfs ls and lsr commands differ from POSIX standards > --- > > Key: HADOOP-2190 > URL: https://issues.apache.org/jira/browse/HADOOP-2190 > Project: Hadoop > Issue Type: Bug > Components: dfs >Affects Versions: 0.15.1, 0.16.0 >Reporter: Mukund Madhugiri > Fix For: 0.16.0 > > Attachments: HADOOP-2190-1.patch > > > Assuming the dfs commands follow POSIX standards, there are some problems > with the DFS ls and lsr commands. I compared the DFS output with that of > RHEL 4u5 > 1. ls a directory when there are no files/directories in that directory: > Linux: No output > DFS: Found 0 items > 2. ls a file/directory that does not exist: > Linux: ls: /doesnotexist: No such file or directory > DFS: Found 0 items > 3. lsr a directory that does not exist: > Linux: ls: /doesnotexist: No such file or directory > DFS: No output -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2213) Job submission gets Job tracker still initializing message while Namenode is in safemode
Job submission gets Job tracker still initializing message while Namenode is in safemode Key: HADOOP-2213 URL: https://issues.apache.org/jira/browse/HADOOP-2213 Project: Hadoop Issue Type: Bug Components: dfs, mapred Reporter: lohit vijayarenu Priority: Minor While namenode is in safemode, if a user submits a job they receive 'Job tracker still initializing' exception. It would be good, if an appropriate error message is thrown. Job started: Thu Nov 15 23:15:39 UTC 2007 org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.JobTracker$IllegalStateException: Job tracker still initializing at org.apache.hadoop.mapred.JobTracker.ensureRunning(JobTracker.java:1505) at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:1513) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at org.apache.hadoop.ipc.Client.call(Client.java:482) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184) at $Proxy1.getNewJobId(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.getNewJobId(Unknown Source) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:452) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753) at org.apache.hadoop.examples.RandomWriter.run(RandomWriter.java:274) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.RandomWriter.main(RandomWriter.java:285) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:49) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:155) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2190) dfs ls and lsr commands differ from POSIX standards
[ https://issues.apache.org/jira/browse/HADOOP-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2190: - Attachment: HADOOP-2190-1.patch Attached patch throws out error for ls, lsr, du, dus commands for non-existing files. There were 2 changes DistributedFileSyste.listPaths() was not returning null for non-existing files and FileSystem.listPaths() was consuming null returned from underlying listPaths and returning zero length Path[] array. So, I have made changes at few places so that listPaths returns null when required. I ran the tests and they look good. If anyone could take a look at the changes and approve it, I will prepare test cases and update another patch. The issue reported was also seen on local filesystem. fixing the second issue also takes care of local filesystem. Below is the output after the patch {noformat} ls/lsr command [ hadoop-trunk]$ hadoop dfs -ls empty Found 0 items [ hadoop-trunk]$ hadoop dfs -ls nofile ls: Could not get listing for nofile [ hadoop-trunk]$ hadoop dfs -lsr nofile lsr: Could not get listing for nofile du command [ hadoop-trunk]$ hadoop dfs -du empty Found 0 items [ hadoop-trunk]$ hadoop dfs -du nofile du: Could not get listing for nofile [ hadoop-trunk]$ hadoop dfs -dus empty empty 0 [ hadoop-trunk]$ hadoop dfs -dus nofile dus: dus: No match: nofile [ hadoop-trunk]$ {noformat} > dfs ls and lsr commands differ from POSIX standards > --- > > Key: HADOOP-2190 > URL: https://issues.apache.org/jira/browse/HADOOP-2190 > Project: Hadoop > Issue Type: Bug > Components: dfs >Affects Versions: 0.15.1, 0.16.0 >Reporter: Mukund Madhugiri > Fix For: 0.16.0 > > Attachments: HADOOP-2190-1.patch > > > Assuming the dfs commands follow POSIX standards, there are some problems > with the DFS ls and lsr commands. I compared the DFS output with that of > RHEL 4u5 > 1. ls a directory when there are no files/directories in that directory: > Linux: No output > DFS: Found 0 items > 2. ls a file/directory that does not exist: > Linux: ls: /doesnotexist: No such file or directory > DFS: Found 0 items > 3. lsr a directory that does not exist: > Linux: ls: /doesnotexist: No such file or directory > DFS: No output -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2210) WebUI should also list current time
WebUI should also list current time --- Key: HADOOP-2210 URL: https://issues.apache.org/jira/browse/HADOOP-2210 Project: Hadoop Issue Type: Bug Reporter: lohit vijayarenu Priority: Minor It would be good if WebUI also listed current time (on all pages). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1722) Make streaming to handle non-utf8 byte array
[ https://issues.apache.org/jira/browse/HADOOP-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542266 ] lohit vijayarenu commented on HADOOP-1722: -- bq all other bytes (not characters!) including non-ascii and non-utf8 are passed literally through. Quoting is done on the stdin of the process and unquoting is done on the stdout of the process. This would make it very easy to write arbitrary binary values to the framework from streaming +1 would it be good to have an option which translate it, preserving the current behavior. It would be easier for few map/reduce scripts for framework to translate it. > Make streaming to handle non-utf8 byte array > > > Key: HADOOP-1722 > URL: https://issues.apache.org/jira/browse/HADOOP-1722 > Project: Hadoop > Issue Type: Improvement > Components: contrib/streaming >Reporter: Runping Qi >Assignee: Christopher Zimmerman > > Right now, the streaming framework expects the output sof the steam process > (mapper or reducer) are line > oriented UTF-8 text. This limit makes it impossible to use those programs > whose outputs may be non-UTF-8 > (international encoding, or maybe even binary data). Streaming can overcome > this limit by introducing a simple > encoding protocol. For example, it can allow the mapper/reducer to hexencode > its keys/values, > the framework decodes them in the Java side. > This way, as long as the mapper/reducer executables follow this encoding > protocol, > they can output arabitary bytearray and the streaming framework can handle > them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2181) Input Split details for maps should be logged
[ https://issues.apache.org/jira/browse/HADOOP-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2181: - Component/s: mapred > Input Split details for maps should be logged > - > > Key: HADOOP-2181 > URL: https://issues.apache.org/jira/browse/HADOOP-2181 > Project: Hadoop > Issue Type: Improvement > Components: mapred >Reporter: lohit vijayarenu >Priority: Minor > > It would be nice if Input split details are logged someplace. This might help > debugging failed map tasks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2181) Input Split details for maps should be logged
Input Split details for maps should be logged - Key: HADOOP-2181 URL: https://issues.apache.org/jira/browse/HADOOP-2181 Project: Hadoop Issue Type: Improvement Reporter: lohit vijayarenu Priority: Minor It would be nice if Input split details are logged someplace. This might help debugging failed map tasks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2151) FileSyste.globPaths does not validate the return list of Paths
[ https://issues.apache.org/jira/browse/HADOOP-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2151: - Status: Patch Available (was: Open) Thanks for reviewing Raghu. Making this PA. > FileSyste.globPaths does not validate the return list of Paths > -- > > Key: HADOOP-2151 > URL: https://issues.apache.org/jira/browse/HADOOP-2151 > Project: Hadoop > Issue Type: Bug > Components: dfs >Affects Versions: 0.15.0, 0.14.3 >Reporter: lohit vijayarenu > Fix For: 0.16.0 > > Attachments: HADOOP-2151-2.patch, HADOOP-2151-3.patch, > HADOOP-2151.patch > > > FileSystem.globPaths does not validate the return list of Paths. > Here is an example. > Consider a directory structure like > /user/foo/DIR1/FILE1 > /user/foo/DIR2 > now if we pass an input path like "/user/foo/*/FILE1" to > FileSystem.globPaths() > It returns 2 entries as shown below > /user/foo/DIR1/FILE1 > /user/foo/DIR2/FILE1 > Should globPaths validate this and return only valid Paths? This behavior was > caught in FileSystem.validateInput() where an IOException is thrown while > processing such a directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1952) Streaming does not handle invalid -inputformat (typo by users for example)
[ https://issues.apache.org/jira/browse/HADOOP-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540775 ] lohit vijayarenu commented on HADOOP-1952: -- Hi Arun, Yes, these were present in the testcases but never used by the actual streaming command in those test cases. The invalid option passed on were ignored by the previous streaming code. The patch now catches such invalid options, so I updated the depending test cases as well. > Streaming does not handle invalid -inputformat (typo by users for example) > --- > > Key: HADOOP-1952 > URL: https://issues.apache.org/jira/browse/HADOOP-1952 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu >Priority: Minor > Fix For: 0.16.0 > > Attachments: CatchInvalidInputFormat.patch, HADOOP-1952-1.patch, > HADOOP-1952-2.patch > > > Hadoop Streaming does not handle invalid inputformat class. For example > -inputformat INVALID class would not be thrown as an error. Instead it > defaults to StreamInputFormat. If an invalid inputformat is specified, it is > good to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2151) FileSyste.globPaths does not validate the return list of Paths
[ https://issues.apache.org/jira/browse/HADOOP-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2151: - Attachment: HADOOP-2151-3.patch Another update with testcase. > FileSyste.globPaths does not validate the return list of Paths > -- > > Key: HADOOP-2151 > URL: https://issues.apache.org/jira/browse/HADOOP-2151 > Project: Hadoop > Issue Type: Bug > Components: dfs >Affects Versions: 0.14.3, 0.15.0 >Reporter: lohit vijayarenu > Fix For: 0.16.0 > > Attachments: HADOOP-2151-2.patch, HADOOP-2151-3.patch, > HADOOP-2151.patch > > > FileSystem.globPaths does not validate the return list of Paths. > Here is an example. > Consider a directory structure like > /user/foo/DIR1/FILE1 > /user/foo/DIR2 > now if we pass an input path like "/user/foo/*/FILE1" to > FileSystem.globPaths() > It returns 2 entries as shown below > /user/foo/DIR1/FILE1 > /user/foo/DIR2/FILE1 > Should globPaths validate this and return only valid Paths? This behavior was > caught in FileSystem.validateInput() where an IOException is thrown while > processing such a directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2151) FileSyste.globPaths does not validate the return list of Paths
[ https://issues.apache.org/jira/browse/HADOOP-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2151: - Attachment: HADOOP-2151-2.patch Thanks Raghu. Here is an updated patch. And also got rid of ArrayList. We allocate new array only when we filter out parents. > FileSyste.globPaths does not validate the return list of Paths > -- > > Key: HADOOP-2151 > URL: https://issues.apache.org/jira/browse/HADOOP-2151 > Project: Hadoop > Issue Type: Bug > Components: dfs >Affects Versions: 0.14.3, 0.15.0 >Reporter: lohit vijayarenu > Fix For: 0.16.0 > > Attachments: HADOOP-2151-2.patch, HADOOP-2151.patch > > > FileSystem.globPaths does not validate the return list of Paths. > Here is an example. > Consider a directory structure like > /user/foo/DIR1/FILE1 > /user/foo/DIR2 > now if we pass an input path like "/user/foo/*/FILE1" to > FileSystem.globPaths() > It returns 2 entries as shown below > /user/foo/DIR1/FILE1 > /user/foo/DIR2/FILE1 > Should globPaths validate this and return only valid Paths? This behavior was > caught in FileSystem.validateInput() where an IOException is thrown while > processing such a directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2151) FileSyste.globPaths does not validate the return list of Paths
[ https://issues.apache.org/jira/browse/HADOOP-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2151: - Attachment: HADOOP-2151.patch Attached is a patch which addresses this problem. globPaths basically checks if the path exits by calling exists() for only those paths which were expanded via listPaths in previous iteration. This is done by passing a new flag for the recursive function to indicating if previous component was glob or not. > FileSyste.globPaths does not validate the return list of Paths > -- > > Key: HADOOP-2151 > URL: https://issues.apache.org/jira/browse/HADOOP-2151 > Project: Hadoop > Issue Type: Bug > Components: dfs >Affects Versions: 0.14.3, 0.15.0 >Reporter: lohit vijayarenu > Fix For: 0.16.0 > > Attachments: HADOOP-2151.patch > > > FileSystem.globPaths does not validate the return list of Paths. > Here is an example. > Consider a directory structure like > /user/foo/DIR1/FILE1 > /user/foo/DIR2 > now if we pass an input path like "/user/foo/*/FILE1" to > FileSystem.globPaths() > It returns 2 entries as shown below > /user/foo/DIR1/FILE1 > /user/foo/DIR2/FILE1 > Should globPaths validate this and return only valid Paths? This behavior was > caught in FileSystem.validateInput() where an IOException is thrown while > processing such a directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2151) FileSyste.globPaths does not validate the return list of Paths
[ https://issues.apache.org/jira/browse/HADOOP-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2151: - Fix Version/s: 0.16.0 > FileSyste.globPaths does not validate the return list of Paths > -- > > Key: HADOOP-2151 > URL: https://issues.apache.org/jira/browse/HADOOP-2151 > Project: Hadoop > Issue Type: Bug > Components: dfs >Affects Versions: 0.14.3, 0.15.0 >Reporter: lohit vijayarenu > Fix For: 0.16.0 > > > FileSystem.globPaths does not validate the return list of Paths. > Here is an example. > Consider a directory structure like > /user/foo/DIR1/FILE1 > /user/foo/DIR2 > now if we pass an input path like "/user/foo/*/FILE1" to > FileSystem.globPaths() > It returns 2 entries as shown below > /user/foo/DIR1/FILE1 > /user/foo/DIR2/FILE1 > Should globPaths validate this and return only valid Paths? This behavior was > caught in FileSystem.validateInput() where an IOException is thrown while > processing such a directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2151) FileSyste.globPaths does not validate the return list of Paths
FileSyste.globPaths does not validate the return list of Paths -- Key: HADOOP-2151 URL: https://issues.apache.org/jira/browse/HADOOP-2151 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.15.0, 0.14.3 Reporter: lohit vijayarenu FileSystem.globPaths does not validate the return list of Paths. Here is an example. Consider a directory structure like /user/foo/DIR1/FILE1 /user/foo/DIR2 now if we pass an input path like "/user/foo/*/FILE1" to FileSystem.globPaths() It returns 2 entries as shown below /user/foo/DIR1/FILE1 /user/foo/DIR2/FILE1 Should globPaths validate this and return only valid Paths? This behavior was caught in FileSystem.validateInput() where an IOException is thrown while processing such a directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1281) Speculative map tasks aren't getting killed although the TIP completed
[ https://issues.apache.org/jira/browse/HADOOP-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539748 ] lohit vijayarenu commented on HADOOP-1281: -- We hit this bug today. Below is the log for 2 attempts for same task Task Attempts Status Progress Start Time Finish Time Errors Task Logs Counters task_200711022153_0001_m_001548_0 SUCCEEDED 100.00% 2-Nov-2007 22:00:59 2-Nov-2007 22:05:50 (4mins, 51sec) task_200711022153_0001_m_001548_1 KILLED 84.44% 2-Nov-2007 22:02:17 2-Nov-2007 22:26:02 (23mins, 45sec) If you look at the time each of the attempt took, after the first attempt finished in ~4mins, the second attempt should have been killed. But it went ahead and was running for ~23min. When we took a look at the logs, we saw that, the attempt was issued a kill signal after the whole job was completed. The JobTracker did not send Kill signal to this task attempt (Or may be nothing was logged). > Speculative map tasks aren't getting killed although the TIP completed > -- > > Key: HADOOP-1281 > URL: https://issues.apache.org/jira/browse/HADOOP-1281 > Project: Hadoop > Issue Type: Bug > Components: mapred >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > The speculative map tasks run to completion although the TIP succeeded since > the other task completed elsewhere. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
[ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2071: - Attachment: HADOOP-2071-5.patch looks like unrelated contrib test failed. But there were 2 findbugs warning which I have fixed and attaching new patch > StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in > hadoop 0.14 > - > > Key: HADOOP-2071 > URL: https://issues.apache.org/jira/browse/HADOOP-2071 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.3 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu > Fix For: 0.16.0 > > Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch, > HADOOP-2071-3.patch, HADOOP-2071-4.patch, HADOOP-2071-5.patch > > > In hadoop 0.14, using -inputreader StreamXmlRecordReader for streaming jobs > throw > java.io.IOException: Mark/reset exception in hadoop 0.14 > This looks to be related to > (https://issues.apache.org/jira/browse/HADOOP-2067). > > Caused by: java.io.IOException: Mark/reset not supported > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353) > at java.io.FilterInputStream.reset(FilterInputStream.java:200) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX > mlRecordReader.java:289) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream > XmlRecordReader.java:118) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str > eamXmlRecordReader.java:111) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader > .java:73) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav > a:63) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2120) dfs -getMerge does not do what it says it does
[ https://issues.apache.org/jira/browse/HADOOP-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539170 ] lohit vijayarenu commented on HADOOP-2120: -- Visualizing this as a map-reduce job which actually merge/sort into a single file, shouldn't it be available as a separate package (like distcp, may be)? This feature of merging files would be very useful for users who would like to have only one output file. For now they would want to stick to a single reducer and do not want to submit a job with multiple reducers (even thought that is better machine utilization). A generic merge utility with understands the format and merges would be useful? Something motivated from https://issues.apache.org/jira/browse/HADOOP-2113 > dfs -getMerge does not do what it says it does > -- > > Key: HADOOP-2120 > URL: https://issues.apache.org/jira/browse/HADOOP-2120 > Project: Hadoop > Issue Type: Bug > Components: mapred >Affects Versions: 0.14.3 > Environment: All >Reporter: Milind Bhandarkar > Fix For: 0.16.0 > > > dfs -getMerge, which calls FileUtil.CopyMerge, contains this javadoc: > {code} > Get all the files in the directories that match the source file pattern >* and merge and sort them to only one file on local fs >* srcf is kept. > {code} > However, it only concatenates the set of input files, rather than merging > them in sorted order. > Ideally, the copyMerge should be equivalent to a map-reduce job with > IdentityMapper and IdentityReducer with numReducers = 1. However, not having > to run this as a map-reduce job has some advantages, since it increases > cluster utilization during reduce phase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
[ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2071: - Status: Patch Available (was: Open) Making this PA > StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in > hadoop 0.14 > - > > Key: HADOOP-2071 > URL: https://issues.apache.org/jira/browse/HADOOP-2071 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.3 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu > Fix For: 0.16.0 > > Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch, > HADOOP-2071-3.patch, HADOOP-2071-4.patch > > > In hadoop 0.14, using -inputreader StreamXmlRecordReader for streaming jobs > throw > java.io.IOException: Mark/reset exception in hadoop 0.14 > This looks to be related to > (https://issues.apache.org/jira/browse/HADOOP-2067). > > Caused by: java.io.IOException: Mark/reset not supported > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353) > at java.io.FilterInputStream.reset(FilterInputStream.java:200) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX > mlRecordReader.java:289) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream > XmlRecordReader.java:118) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str > eamXmlRecordReader.java:111) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader > .java:73) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav > a:63) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
[ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2071: - Attachment: HADOOP-2071-4.patch Getting rid of an extra blank line in the patch. > StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in > hadoop 0.14 > - > > Key: HADOOP-2071 > URL: https://issues.apache.org/jira/browse/HADOOP-2071 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.3 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu > Fix For: 0.16.0 > > Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch, > HADOOP-2071-3.patch, HADOOP-2071-4.patch > > > In hadoop 0.14, using -inputreader StreamXmlRecordReader for streaming jobs > throw > java.io.IOException: Mark/reset exception in hadoop 0.14 > This looks to be related to > (https://issues.apache.org/jira/browse/HADOOP-2067). > > Caused by: java.io.IOException: Mark/reset not supported > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353) > at java.io.FilterInputStream.reset(FilterInputStream.java:200) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX > mlRecordReader.java:289) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream > XmlRecordReader.java:118) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str > eamXmlRecordReader.java:111) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader > .java:73) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav > a:63) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1952) Streaming does not handle invalid -inputformat (typo by users for example)
[ https://issues.apache.org/jira/browse/HADOOP-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1952: - Status: Patch Available (was: Open) Thanks Arun. Making this PA > Streaming does not handle invalid -inputformat (typo by users for example) > --- > > Key: HADOOP-1952 > URL: https://issues.apache.org/jira/browse/HADOOP-1952 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu >Priority: Minor > Fix For: 0.16.0 > > Attachments: CatchInvalidInputFormat.patch, HADOOP-1952-1.patch, > HADOOP-1952-2.patch > > > Hadoop Streaming does not handle invalid inputformat class. For example > -inputformat INVALID class would not be thrown as an error. Instead it > defaults to StreamInputFormat. If an invalid inputformat is specified, it is > good to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1952) Streaming does not handle invalid -inputformat (typo by users for example)
[ https://issues.apache.org/jira/browse/HADOOP-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1952: - Attachment: HADOOP-1952-2.patch On the other hand if I think about it, should we log at info level inside goodClassOrNull ? If the mapper is 'cat' which is valid executable, we should not log saying cat class not found, no? I reverting back the loggin inside goodClassOrNull and handle the failures in StreamJob where needed. Thoughts? (Attached is a patch, which reverts only login changes) > Streaming does not handle invalid -inputformat (typo by users for example) > --- > > Key: HADOOP-1952 > URL: https://issues.apache.org/jira/browse/HADOOP-1952 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu >Priority: Minor > Fix For: 0.16.0 > > Attachments: CatchInvalidInputFormat.patch, HADOOP-1952-1.patch, > HADOOP-1952-2.patch > > > Hadoop Streaming does not handle invalid inputformat class. For example > -inputformat INVALID class would not be thrown as an error. Instead it > defaults to StreamInputFormat. If an invalid inputformat is specified, it is > good to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
[ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2071: - Attachment: HADOOP-2071-3.patch BufferedInputStream does not provide a way to get the current position in the stream and updating the encapsulated FSDataInputStream again is like seek back. So I have a the position stored in pos_ and update it accordingly. Attaching a new patch with this change and testcase. Please could anyone take a look. Thanks! > StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in > hadoop 0.14 > - > > Key: HADOOP-2071 > URL: https://issues.apache.org/jira/browse/HADOOP-2071 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.3 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu > Fix For: 0.16.0 > > Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch, > HADOOP-2071-3.patch > > > In hadoop 0.14, using -inputreader StreamXmlRecordReader for streaming jobs > throw > java.io.IOException: Mark/reset exception in hadoop 0.14 > This looks to be related to > (https://issues.apache.org/jira/browse/HADOOP-2067). > > Caused by: java.io.IOException: Mark/reset not supported > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353) > at java.io.FilterInputStream.reset(FilterInputStream.java:200) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX > mlRecordReader.java:289) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream > XmlRecordReader.java:118) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str > eamXmlRecordReader.java:111) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader > .java:73) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav > a:63) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2089) Multiple caheArchive does not work in Hadoop streaming
[ https://issues.apache.org/jira/browse/HADOOP-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2089: - Status: Patch Available (was: Open) Making it patch available > Multiple caheArchive does not work in Hadoop streaming > -- > > Key: HADOOP-2089 > URL: https://issues.apache.org/jira/browse/HADOOP-2089 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.3 > Environment: All >Reporter: Milind Bhandarkar >Assignee: lohit vijayarenu >Priority: Critical > Fix For: 0.16.0 > > Attachments: HADOOP-2089-1.patch, HADOOP-2089-2.patch, > HADOOP-2089-3.patch > > > Multiple -cacheArchive options in Hadoop streaming does not work. Here is the > stack trace: > Exception in thread "main" java.lang.RuntimeException: > at org.apache.hadoop.streaming.StreamJob.fail(StreamJob.java:528) > at > org.apache.hadoop.streaming.StreamJob.exitUsage(StreamJob.java:469) > at > org.apache.hadoop.streaming.StreamJob.parseArgv(StreamJob.java:203) > at org.apache.hadoop.streaming.StreamJob.go(StreamJob.java:105) > at > org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:33) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:155) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2089) Multiple caheArchive does not work in Hadoop streaming
[ https://issues.apache.org/jira/browse/HADOOP-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2089: - Attachment: HADOOP-2089-3.patch Thanks Devaraj. I fixed this and attaching an updated patch. > Multiple caheArchive does not work in Hadoop streaming > -- > > Key: HADOOP-2089 > URL: https://issues.apache.org/jira/browse/HADOOP-2089 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.3 > Environment: All >Reporter: Milind Bhandarkar >Assignee: lohit vijayarenu >Priority: Critical > Fix For: 0.16.0 > > Attachments: HADOOP-2089-1.patch, HADOOP-2089-2.patch, > HADOOP-2089-3.patch > > > Multiple -cacheArchive options in Hadoop streaming does not work. Here is the > stack trace: > Exception in thread "main" java.lang.RuntimeException: > at org.apache.hadoop.streaming.StreamJob.fail(StreamJob.java:528) > at > org.apache.hadoop.streaming.StreamJob.exitUsage(StreamJob.java:469) > at > org.apache.hadoop.streaming.StreamJob.parseArgv(StreamJob.java:203) > at org.apache.hadoop.streaming.StreamJob.go(StreamJob.java:105) > at > org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:33) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:155) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1952) Streaming does not handle invalid -inputformat (typo by users for example)
[ https://issues.apache.org/jira/browse/HADOOP-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1952: - Attachment: HADOOP-1952-1.patch I am attaching an updated patch, which logs at info level inside goodClassOrNull method and fails for invalid class in partitioner/combiner/output format/inputformat. Fixed streaming test cases which were using invalid combiner which again was ignored previously. > Streaming does not handle invalid -inputformat (typo by users for example) > --- > > Key: HADOOP-1952 > URL: https://issues.apache.org/jira/browse/HADOOP-1952 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu >Priority: Minor > Fix For: 0.16.0 > > Attachments: CatchInvalidInputFormat.patch, HADOOP-1952-1.patch > > > Hadoop Streaming does not handle invalid inputformat class. For example > -inputformat INVALID class would not be thrown as an error. Instead it > defaults to StreamInputFormat. If an invalid inputformat is specified, it is > good to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1952) Streaming does not handle invalid -inputformat (typo by users for example)
[ https://issues.apache.org/jira/browse/HADOOP-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537838 ] lohit vijayarenu commented on HADOOP-1952: -- Ok, I should have mentioned that StreamUtil.goodClassOrNull actually is used to check if the map command passed is a class or an executable. So, -mapper could have either a class or just an executable. So, if that is not a valid class, a null (as the method name indicates) is returned and StreamJob treats it as a map executable. How about logging at info level inside the goodClassOrNull class and instead of throwing an exception we fail at appropriate places, when we do not allow null? If so, I will modify that and submit a patch. > Streaming does not handle invalid -inputformat (typo by users for example) > --- > > Key: HADOOP-1952 > URL: https://issues.apache.org/jira/browse/HADOOP-1952 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu >Priority: Minor > Fix For: 0.16.0 > > Attachments: CatchInvalidInputFormat.patch > > > Hadoop Streaming does not handle invalid inputformat class. For example > -inputformat INVALID class would not be thrown as an error. Instead it > defaults to StreamInputFormat. If an invalid inputformat is specified, it is > good to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
[ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2071: - Status: Open (was: Patch Available) Thanks Raghu. I will look into this case and resubmit new one. Canceling the patch. > StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in > hadoop 0.14 > - > > Key: HADOOP-2071 > URL: https://issues.apache.org/jira/browse/HADOOP-2071 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.3 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu > Fix For: 0.15.0 > > Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch > > > In hadoop 0.14, using -inputreader StreamXmlRecordReader for streaming jobs > throw > java.io.IOException: Mark/reset exception in hadoop 0.14 > This looks to be related to > (https://issues.apache.org/jira/browse/HADOOP-2067). > > Caused by: java.io.IOException: Mark/reset not supported > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353) > at java.io.FilterInputStream.reset(FilterInputStream.java:200) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX > mlRecordReader.java:289) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream > XmlRecordReader.java:118) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str > eamXmlRecordReader.java:111) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader > .java:73) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav > a:63) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2101) JobTracker Startup failed with java.net.BindException
JobTracker Startup failed with java.net.BindException - Key: HADOOP-2101 URL: https://issues.apache.org/jira/browse/HADOOP-2101 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.14.3 Reporter: lohit vijayarenu We have seen one case where JobTracker failed with IOException but later retries of startup fails with BindException going into loop before failing. Here is the stacktrace. 2007-10-23 05:51:19,374 WARN org.apache.hadoop.mapred.JobTracker: Error starting tracker: java.io.IOException: Problem starting http server at org.apache.hadoop.mapred.StatusHttpServer.start(StatusHttpServer.java:202) at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:659) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:108) at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:1788) Caused by: org.mortbay.util.MultiException[java.net.BindException: Address already in use] at org.mortbay.http.HttpServer.doStart(HttpServer.java:731) at org.mortbay.util.Container.start(Container.java:72) at org.apache.hadoop.mapred.StatusHttpServer.start(StatusHttpServer.java:179) ... 3 more 2007-10-23 05:51:20,421 WARN org.apache.hadoop.mapred.JobTracker: Error starting tracker: java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at org.apache.hadoop.ipc.Server$Listener.(Server.java:185) at org.apache.hadoop.ipc.Server.(Server.java:627) at org.apache.hadoop.ipc.RPC$Server.(RPC.java:324) at org.apache.hadoop.ipc.RPC.getServer(RPC.java:294) at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:647) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:108) at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:1788) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1952) Streaming does not handle invalid -inputformat (typo by users for example)
[ https://issues.apache.org/jira/browse/HADOOP-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537658 ] lohit vijayarenu commented on HADOOP-1952: -- Thanks for looking into this Arun. The use case here was when a user specifies a partition/combiner class and if we discover the class is not available, the earlier code used to just ignore it. So, I added the message to let them know that their specified class does not exists and we are defaulting. Should I move it to DEBUG and resubmit a patch? > Streaming does not handle invalid -inputformat (typo by users for example) > --- > > Key: HADOOP-1952 > URL: https://issues.apache.org/jira/browse/HADOOP-1952 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu >Priority: Minor > Attachments: CatchInvalidInputFormat.patch > > > Hadoop Streaming does not handle invalid inputformat class. For example > -inputformat INVALID class would not be thrown as an error. Instead it > defaults to StreamInputFormat. If an invalid inputformat is specified, it is > good to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2089) Multiple caheArchive does not work in Hadoop streaming
[ https://issues.apache.org/jira/browse/HADOOP-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2089: - Attachment: HADOOP-2089-2.patch Updating the patch with a test case. > Multiple caheArchive does not work in Hadoop streaming > -- > > Key: HADOOP-2089 > URL: https://issues.apache.org/jira/browse/HADOOP-2089 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.3 > Environment: All >Reporter: Milind Bhandarkar >Assignee: lohit vijayarenu >Priority: Critical > Attachments: HADOOP-2089-1.patch, HADOOP-2089-2.patch > > > Multiple -cacheArchive options in Hadoop streaming does not work. Here is the > stack trace: > Exception in thread "main" java.lang.RuntimeException: > at org.apache.hadoop.streaming.StreamJob.fail(StreamJob.java:528) > at > org.apache.hadoop.streaming.StreamJob.exitUsage(StreamJob.java:469) > at > org.apache.hadoop.streaming.StreamJob.parseArgv(StreamJob.java:203) > at org.apache.hadoop.streaming.StreamJob.go(StreamJob.java:105) > at > org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:33) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:155) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2089) Multiple caheArchive does not work in Hadoop streaming
[ https://issues.apache.org/jira/browse/HADOOP-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2089: - Attachment: HADOOP-2089-1.patch Small Fix to properly parse command line option > Multiple caheArchive does not work in Hadoop streaming > -- > > Key: HADOOP-2089 > URL: https://issues.apache.org/jira/browse/HADOOP-2089 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.3 > Environment: All >Reporter: Milind Bhandarkar >Assignee: lohit vijayarenu >Priority: Critical > Attachments: HADOOP-2089-1.patch > > > Multiple -cacheArchive options in Hadoop streaming does not work. Here is the > stack trace: > Exception in thread "main" java.lang.RuntimeException: > at org.apache.hadoop.streaming.StreamJob.fail(StreamJob.java:528) > at > org.apache.hadoop.streaming.StreamJob.exitUsage(StreamJob.java:469) > at > org.apache.hadoop.streaming.StreamJob.parseArgv(StreamJob.java:203) > at org.apache.hadoop.streaming.StreamJob.go(StreamJob.java:105) > at > org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:33) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:155) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2066) filenames with ':' colon throws java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/HADOOP-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536716 ] lohit vijayarenu commented on HADOOP-2066: -- Hi Nicholas, I just tried this patch against trunk. Am I doing this right? [lohit@ hadoop-trunk]$ hadoop dfs -put ./abcd:efgh.tar.gz "abcd\:efgh.tar.gz" put: Pathname /user/lohit/abcd\:efgh.tar.gz from abcd\:efgh.tar.gz is not a valid DFS filename. [lohit@ hadoop-trunk]$ hadoop dfs -put ./abcd:efgh.tar.gz "abcd%3aefgh.tar.gz" [lohit@ hadoop-trunk]$ hadoop dfs -ls Found 3 items /user/lohit/abcd 5 2007-10-22 15:38 /user/lohit/abcd%3aefgh.tar.gz 5 2007-10-22 15:39 /user/lohit/test.x 61646 2007-10-22 15:38 [EMAIL PROTECTED] hadoop-trunk]$ Thanks! > filenames with ':' colon throws java.lang.IllegalArgumentException > -- > > Key: HADOOP-2066 > URL: https://issues.apache.org/jira/browse/HADOOP-2066 > Project: Hadoop > Issue Type: Bug > Components: dfs >Reporter: lohit vijayarenu > Attachments: 2066_20071019.patch, HADOOP-2066.patch > > > File names containing colon ":" throws java.lang.IllegalArgumentException > while LINUX file system supports it. > $ hadoop dfs -put ./testfile-2007-09-24-03:00:00.gz filenametest > Exception in thread "main" java.lang.IllegalArgumentException: > java.net.URISyntaxException: Relative path in absolute > URI: testfile-2007-09-24-03:00:00.gz > at org.apache.hadoop.fs.Path.initialize(Path.java:140) > at org.apache.hadoop.fs.Path.(Path.java:126) > at org.apache.hadoop.fs.Path.(Path.java:50) > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:273) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:117) > at > org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:776) > at > org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:757) > at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:116) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:1229) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:1342) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > testfile-2007-09-24-03:00:00.gz > at java.net.URI.checkPath(URI.java:1787) > at java.net.URI.(URI.java:735) > at org.apache.hadoop.fs.Path.initialize(Path.java:137) > ... 10 more > Path(String pathString) when given a filename which contains ':' treats it as > URI and selects anything before ':' as > scheme, which in this case is clearly not a valid scheme. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1952) Streaming does not handle invalid -inputformat (typo by users for example)
[ https://issues.apache.org/jira/browse/HADOOP-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1952: - Status: Patch Available (was: Open) Making this patch available. > Streaming does not handle invalid -inputformat (typo by users for example) > --- > > Key: HADOOP-1952 > URL: https://issues.apache.org/jira/browse/HADOOP-1952 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu >Priority: Minor > Attachments: CatchInvalidInputFormat.patch > > > Hadoop Streaming does not handle invalid inputformat class. For example > -inputformat INVALID class would not be thrown as an error. Instead it > defaults to StreamInputFormat. If an invalid inputformat is specified, it is > good to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1952) Streaming does not handle invalid -inputformat (typo by users for example)
[ https://issues.apache.org/jira/browse/HADOOP-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1952: - Assignee: lohit vijayarenu > Streaming does not handle invalid -inputformat (typo by users for example) > --- > > Key: HADOOP-1952 > URL: https://issues.apache.org/jira/browse/HADOOP-1952 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu >Priority: Minor > Attachments: CatchInvalidInputFormat.patch > > > Hadoop Streaming does not handle invalid inputformat class. For example > -inputformat INVALID class would not be thrown as an error. Instead it > defaults to StreamInputFormat. If an invalid inputformat is specified, it is > good to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
[ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2071: - Attachment: HADOOP-2071-2.patch With inputs from Raghu and Milind, here is an updated patch. This wraps FSDataInputStream around BufferedInputStream and eliminates seek(). Patch also includes a simple test case for StreamXmlRecordReader. > StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in > hadoop 0.14 > - > > Key: HADOOP-2071 > URL: https://issues.apache.org/jira/browse/HADOOP-2071 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.3 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu > Attachments: HADOOP-2071-1.patch, HADOOP-2071-2.patch > > > In hadoop 0.14, using -inputreader StreamXmlRecordReader for streaming jobs > throw > java.io.IOException: Mark/reset exception in hadoop 0.14 > This looks to be related to > (https://issues.apache.org/jira/browse/HADOOP-2067). > > Caused by: java.io.IOException: Mark/reset not supported > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353) > at java.io.FilterInputStream.reset(FilterInputStream.java:200) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX > mlRecordReader.java:289) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream > XmlRecordReader.java:118) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str > eamXmlRecordReader.java:111) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader > .java:73) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav > a:63) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2066) filenames with ':' colon throws java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/HADOOP-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535986 ] lohit vijayarenu commented on HADOOP-2066: -- How would be the usage with this patch? I tried these [EMAIL PROTECTED] hadoop dfs -put ./testfile-2007-09-24-03:00:00.gz "testfile-2007-09-24-03%3a00%3a00.gz" put: Pathname /user/lohit/testfile-2007-09-24-03:00:00.gz from testfile-2007-09-24-03:00:00.gz is not a valid DFS filename. [EMAIL PROTECTED] echo "TEST" > testfile [EMAIL PROTECTED] hadoop dfs -put ./testfile "testfile-2007-09-24-03%3a00%3a00.gz" put: Pathname /user/lohit/testfile-2007-09-24-03:00:00.gz from testfile-2007-09-24-03:00:00.gz is not a valid DFS filename. > filenames with ':' colon throws java.lang.IllegalArgumentException > -- > > Key: HADOOP-2066 > URL: https://issues.apache.org/jira/browse/HADOOP-2066 > Project: Hadoop > Issue Type: Bug > Components: dfs >Reporter: lohit vijayarenu > Attachments: HADOOP-2066.patch > > > File names containing colon ":" throws java.lang.IllegalArgumentException > while LINUX file system supports it. > $ hadoop dfs -put ./testfile-2007-09-24-03:00:00.gz filenametest > Exception in thread "main" java.lang.IllegalArgumentException: > java.net.URISyntaxException: Relative path in absolute > URI: testfile-2007-09-24-03:00:00.gz > at org.apache.hadoop.fs.Path.initialize(Path.java:140) > at org.apache.hadoop.fs.Path.(Path.java:126) > at org.apache.hadoop.fs.Path.(Path.java:50) > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:273) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:117) > at > org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:776) > at > org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:757) > at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:116) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:1229) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:1342) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > testfile-2007-09-24-03:00:00.gz > at java.net.URI.checkPath(URI.java:1787) > at java.net.URI.(URI.java:735) > at org.apache.hadoop.fs.Path.initialize(Path.java:137) > ... 10 more > Path(String pathString) when given a filename which contains ':' treats it as > URI and selects anything before ':' as > scheme, which in this case is clearly not a valid scheme. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
[ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2071: - Attachment: HADOOP-2071-1.patch Attached is a patch, which eliminates mark/reset. At one place seek() was called even after reset() which made it redundant. Please could anyone review this. Thanks > StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in > hadoop 0.14 > - > > Key: HADOOP-2071 > URL: https://issues.apache.org/jira/browse/HADOOP-2071 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.3 >Reporter: lohit vijayarenu > Attachments: HADOOP-2071-1.patch > > > In hadoop 0.14, using -inputreader StreamXmlRecordReader for streaming jobs > throw > java.io.IOException: Mark/reset exception in hadoop 0.14 > This looks to be related to > (https://issues.apache.org/jira/browse/HADOOP-2067). > > Caused by: java.io.IOException: Mark/reset not supported > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353) > at java.io.FilterInputStream.reset(FilterInputStream.java:200) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX > mlRecordReader.java:289) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream > XmlRecordReader.java:118) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str > eamXmlRecordReader.java:111) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader > .java:73) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav > a:63) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
[ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2071: - Assignee: lohit vijayarenu > StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in > hadoop 0.14 > - > > Key: HADOOP-2071 > URL: https://issues.apache.org/jira/browse/HADOOP-2071 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.3 >Reporter: lohit vijayarenu >Assignee: lohit vijayarenu > Attachments: HADOOP-2071-1.patch > > > In hadoop 0.14, using -inputreader StreamXmlRecordReader for streaming jobs > throw > java.io.IOException: Mark/reset exception in hadoop 0.14 > This looks to be related to > (https://issues.apache.org/jira/browse/HADOOP-2067). > > Caused by: java.io.IOException: Mark/reset not supported > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353) > at java.io.FilterInputStream.reset(FilterInputStream.java:200) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX > mlRecordReader.java:289) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream > XmlRecordReader.java:118) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str > eamXmlRecordReader.java:111) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader > .java:73) > at > org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav > a:63) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14 - Key: HADOOP-2071 URL: https://issues.apache.org/jira/browse/HADOOP-2071 Project: Hadoop Issue Type: Bug Components: contrib/streaming Affects Versions: 0.14.3 Reporter: lohit vijayarenu In hadoop 0.14, using -inputreader StreamXmlRecordReader for streaming jobs throw java.io.IOException: Mark/reset exception in hadoop 0.14 This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067). Caused by: java.io.IOException: Mark/reset not supported at org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353) at java.io.FilterInputStream.reset(FilterInputStream.java:200) at org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX mlRecordReader.java:289) at org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream XmlRecordReader.java:118) at org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str eamXmlRecordReader.java:111) at org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader .java:73) at org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav a:63) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2066) filenames with ':' colon throws java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/HADOOP-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2066: - Description: File names containing colon ":" throws java.lang.IllegalArgumentException while LINUX file system supports it. $ hadoop dfs -put ./testfile-2007-09-24-03:00:00.gz filenametest Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: testfile-2007-09-24-03:00:00.gz at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.(Path.java:126) at org.apache.hadoop.fs.Path.(Path.java:50) at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:273) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:117) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:776) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:757) at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:116) at org.apache.hadoop.fs.FsShell.run(FsShell.java:1229) at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1342) Caused by: java.net.URISyntaxException: Relative path in absolute URI: testfile-2007-09-24-03:00:00.gz at java.net.URI.checkPath(URI.java:1787) at java.net.URI.(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:137) ... 10 more Path(String pathString) when given a filename which contains ':' treats it as URI and selects anything before ':' as scheme, which in this case is clearly not a valid scheme. was: File names containing colon ":" throws java.lang.IllegalArgumentException while LINUX file system supports it. [EMAIL PROTECTED] ~]$ hadoop dfs -put ./testfile-2007-09-24-03:00:00.gz filenametest Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: testfile-2007-09-24-03:00:00.gz at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.(Path.java:126) at org.apache.hadoop.fs.Path.(Path.java:50) at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:273) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:117) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:776) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:757) at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:116) at org.apache.hadoop.fs.FsShell.run(FsShell.java:1229) at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1342) Caused by: java.net.URISyntaxException: Relative path in absolute URI: testfile-2007-09-24-03:00:00.gz at java.net.URI.checkPath(URI.java:1787) at java.net.URI.(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:137) ... 10 more [EMAIL PROTECTED] ~]$ Path(String pathString) when given a filename which contains ':' treats it as URI and selects anything before ':' as scheme, which in this case is clearly not a valid scheme. > filenames with ':' colon throws java.lang.IllegalArgumentException > -- > > Key: HADOOP-2066 > URL: https://issues.apache.org/jira/browse/HADOOP-2066 > Project: Hadoop > Issue Type: Bug > Components: dfs >Reporter: lohit vijayarenu > Attachments: HADOOP-2066.patch > > > File names containing colon ":" throws java.lang.IllegalArgumentException > while LINUX file system supports it. > $ hadoop dfs -put ./testfile-2007-09-24-03:00:00.gz filenametest > Exception in thread "main" java.lang.IllegalArgumentException: > java.net.URISyntaxException: Relative path in absolute > URI: testfile-2007-09-24-03:00:00.gz > at org.apache.hadoop.fs.Path.initialize(Path.java:140) > at org.apache.hadoop.fs.Path.(Path.java:126) > at org.apache.hadoop.fs.Path.(Path.java:50) > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:273) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:117) > at > org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:776) > at > org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:757) > at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:116) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:1229) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:1342) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > testfile-2007-09-24-03:00:00.gz > at java.net.URI.checkPath(URI.java:1787) > at j
[jira] Updated: (HADOOP-2067) multiple close() failing in Hadoop 0.14
[ https://issues.apache.org/jira/browse/HADOOP-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2067: - Attachment: stack_trace_13_and_14.txt > multiple close() failing in Hadoop 0.14 > --- > > Key: HADOOP-2067 > URL: https://issues.apache.org/jira/browse/HADOOP-2067 > Project: Hadoop > Issue Type: Bug > Components: dfs >Reporter: lohit vijayarenu > Attachments: stack_trace_13_and_14.txt > > > It looks like multiple close() calls, while reading files from DFS is failing > in hadoop 0.14. This was somehow not caught in hadoop 0.13. > The use case was to open a file on DFS like shown below > > FSDataInputStream > fSDataInputStream = > fileSystem.open(new Path(propertyFileName)); > Properties subProperties = > new Properties(); > subProperties. > loadFromXML(fSDataInputStream); > fSDataInputStream. > close(); > > This failed with an IOException > > EXCEPTIN RAISED, which is java.io.IOException: Stream closed > java.io.IOException: Stream closed > > The stack trace shows its being closed twice. While this used to work in > hadoop 0.13 which used to hide this. > Attached with this JIRA is a text file which has stack trace for both hadoop > 0.13 and hadoop 0.14. > How should this be handled from a users point of view? > Thanks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2067) multiple close() failing in Hadoop 0.14
multiple close() failing in Hadoop 0.14 --- Key: HADOOP-2067 URL: https://issues.apache.org/jira/browse/HADOOP-2067 Project: Hadoop Issue Type: Bug Components: dfs Reporter: lohit vijayarenu It looks like multiple close() calls, while reading files from DFS is failing in hadoop 0.14. This was somehow not caught in hadoop 0.13. The use case was to open a file on DFS like shown below FSDataInputStream fSDataInputStream = fileSystem.open(new Path(propertyFileName)); Properties subProperties = new Properties(); subProperties. loadFromXML(fSDataInputStream); fSDataInputStream. close(); This failed with an IOException EXCEPTIN RAISED, which is java.io.IOException: Stream closed java.io.IOException: Stream closed The stack trace shows its being closed twice. While this used to work in hadoop 0.13 which used to hide this. Attached with this JIRA is a text file which has stack trace for both hadoop 0.13 and hadoop 0.14. How should this be handled from a users point of view? Thanks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2066) filenames with ':' colon throws java.lang.IllegalArgumentException
filenames with ':' colon throws java.lang.IllegalArgumentException -- Key: HADOOP-2066 URL: https://issues.apache.org/jira/browse/HADOOP-2066 Project: Hadoop Issue Type: Bug Components: dfs Reporter: lohit vijayarenu File names containing colon ":" throws java.lang.IllegalArgumentException while LINUX file system supports it. [EMAIL PROTECTED] ~]$ hadoop dfs -put ./testfile-2007-09-24-03:00:00.gz filenametest Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: testfile-2007-09-24-03:00:00.gz at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.(Path.java:126) at org.apache.hadoop.fs.Path.(Path.java:50) at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:273) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:117) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:776) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:757) at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:116) at org.apache.hadoop.fs.FsShell.run(FsShell.java:1229) at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1342) Caused by: java.net.URISyntaxException: Relative path in absolute URI: testfile-2007-09-24-03:00:00.gz at java.net.URI.checkPath(URI.java:1787) at java.net.URI.(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:137) ... 10 more [EMAIL PROTECTED] ~]$ Path(String pathString) when given a filename which contains ':' treats it as URI and selects anything before ':' as scheme, which in this case is clearly not a valid scheme. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2053) OutOfMemoryError : Java heap space errors in hadoop 0.14
OutOfMemoryError : Java heap space errors in hadoop 0.14 Key: HADOOP-2053 URL: https://issues.apache.org/jira/browse/HADOOP-2053 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.14.1 Reporter: lohit vijayarenu Fix For: 0.15.0 In recent hadoop 0.14 we are seeing few jobs where map taskf fail with java.lang.OutOfMemoryError: Java heap space problem These were the same jobs which used to work fine with 0.13 task_200710112103_0001_m_15_1: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.io.Text.write(Text.java:243) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:340) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1771) streaming hang when IOException in MROutputThread. (NPE)
[ https://issues.apache.org/jira/browse/HADOOP-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1771: - Status: Patch Available (was: Open) > streaming hang when IOException in MROutputThread. (NPE) > > > Key: HADOOP-1771 > URL: https://issues.apache.org/jira/browse/HADOOP-1771 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: Koji Noguchi >Assignee: lohit vijayarenu >Priority: Blocker > Fix For: 0.15.0 > > Attachments: h-1771-2.patch, h-1771-3.patch, h-1771-3.patch, > h-1771-3.patch, h-1771-3.patch2, h-1771.patch > > > One streaming task hang and had stderr userlog as follows. > {code} > Exception in thread "Thread-5" java.lang.NullPointerException > at java.lang.Throwable.printStackTrace(Throwable.java:460) > at > org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:352) > {code} > In PipeMapRed.java > {code} > 351 } catch (IOException io) { > 352 io.printStackTrace(log_); > 353 outerrThreadsThrowable = io; > {code} > I guess log_ is Null... Should call logStackTrace. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1771) streaming hang when IOException in MROutputThread. (NPE)
[ https://issues.apache.org/jira/browse/HADOOP-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1771: - Attachment: h-1771-3.patch2 Sorry about that. FIxed and updated the patch > streaming hang when IOException in MROutputThread. (NPE) > > > Key: HADOOP-1771 > URL: https://issues.apache.org/jira/browse/HADOOP-1771 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: Koji Noguchi >Assignee: Michel Tourn >Priority: Blocker > Fix For: 0.15.0 > > Attachments: h-1771-2.patch, h-1771-3.patch, h-1771-3.patch, > h-1771-3.patch, h-1771-3.patch2, h-1771.patch > > > One streaming task hang and had stderr userlog as follows. > {code} > Exception in thread "Thread-5" java.lang.NullPointerException > at java.lang.Throwable.printStackTrace(Throwable.java:460) > at > org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:352) > {code} > In PipeMapRed.java > {code} > 351 } catch (IOException io) { > 352 io.printStackTrace(log_); > 353 outerrThreadsThrowable = io; > {code} > I guess log_ is Null... Should call logStackTrace. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1771) streaming hang when IOException in MROutputThread. (NPE)
[ https://issues.apache.org/jira/browse/HADOOP-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1771: - Affects Version/s: (was: 0.13.1) 0.14.1 Status: Patch Available (was: Open) making patch available. Thanks > streaming hang when IOException in MROutputThread. (NPE) > > > Key: HADOOP-1771 > URL: https://issues.apache.org/jira/browse/HADOOP-1771 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Blocker > Fix For: 0.15.0 > > Attachments: h-1771-2.patch, h-1771-3.patch, h-1771-3.patch, > h-1771-3.patch, h-1771.patch > > > One streaming task hang and had stderr userlog as follows. > {code} > Exception in thread "Thread-5" java.lang.NullPointerException > at java.lang.Throwable.printStackTrace(Throwable.java:460) > at > org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:352) > {code} > In PipeMapRed.java > {code} > 351 } catch (IOException io) { > 352 io.printStackTrace(log_); > 353 outerrThreadsThrowable = io; > {code} > I guess log_ is Null... Should call logStackTrace. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1771) streaming hang when IOException in MROutputThread. (NPE)
[ https://issues.apache.org/jira/browse/HADOOP-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1771: - Attachment: h-1771-3.patch With inputs from Owen, the patch has changed closing of streaming in main body of run() and log the IOException as well. Thanks > streaming hang when IOException in MROutputThread. (NPE) > > > Key: HADOOP-1771 > URL: https://issues.apache.org/jira/browse/HADOOP-1771 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.13.1 >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Blocker > Fix For: 0.15.0 > > Attachments: h-1771-2.patch, h-1771-3.patch, h-1771-3.patch, > h-1771-3.patch, h-1771.patch > > > One streaming task hang and had stderr userlog as follows. > {code} > Exception in thread "Thread-5" java.lang.NullPointerException > at java.lang.Throwable.printStackTrace(Throwable.java:460) > at > org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:352) > {code} > In PipeMapRed.java > {code} > 351 } catch (IOException io) { > 352 io.printStackTrace(log_); > 353 outerrThreadsThrowable = io; > {code} > I guess log_ is Null... Should call logStackTrace. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1771) streaming hang when IOException in MROutputThread. (NPE)
[ https://issues.apache.org/jira/browse/HADOOP-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1771: - Attachment: h-1771-3.patch Updated with alignment change. Thanks! > streaming hang when IOException in MROutputThread. (NPE) > > > Key: HADOOP-1771 > URL: https://issues.apache.org/jira/browse/HADOOP-1771 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.13.1 >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Blocker > Fix For: 0.15.0 > > Attachments: h-1771-2.patch, h-1771-3.patch, h-1771-3.patch, > h-1771.patch > > > One streaming task hang and had stderr userlog as follows. > {code} > Exception in thread "Thread-5" java.lang.NullPointerException > at java.lang.Throwable.printStackTrace(Throwable.java:460) > at > org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:352) > {code} > In PipeMapRed.java > {code} > 351 } catch (IOException io) { > 352 io.printStackTrace(log_); > 353 outerrThreadsThrowable = io; > {code} > I guess log_ is Null... Should call logStackTrace. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1771) streaming hang when IOException in MROutputThread. (NPE)
[ https://issues.apache.org/jira/browse/HADOOP-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1771: - Attachment: h-1771-3.patch Thanks to Koji, while debugging a similar problem, we saw similar case where MROutputThread was done, but the map task was still hung Upon killing the streaming map executable, we saw there was a problem with the thread so it had terminated. Looks like the streaming map executable was still trying to write and was hung. The problem could be that clientIn_ and clientErr_ are not closed when their thread is done, which causes this problem. Attached is an updated patch, which makes sure it will close the streams when their threads exit. Please could anyone review/comment. Thanks. > streaming hang when IOException in MROutputThread. (NPE) > > > Key: HADOOP-1771 > URL: https://issues.apache.org/jira/browse/HADOOP-1771 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.13.1 >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.15.0 > > Attachments: h-1771-2.patch, h-1771-3.patch, h-1771.patch > > > One streaming task hang and had stderr userlog as follows. > {code} > Exception in thread "Thread-5" java.lang.NullPointerException > at java.lang.Throwable.printStackTrace(Throwable.java:460) > at > org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:352) > {code} > In PipeMapRed.java > {code} > 351 } catch (IOException io) { > 352 io.printStackTrace(log_); > 353 outerrThreadsThrowable = io; > {code} > I guess log_ is Null... Should call logStackTrace. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1626) DFSAdmin. Help messages are missing for -finalizeUpgrade and -metasave.
[ https://issues.apache.org/jira/browse/HADOOP-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1626: - Attachment: HADOOP-1626.2.patch Updated patch with few more changes suggested by Nigel. Thanks! > DFSAdmin. Help messages are missing for -finalizeUpgrade and -metasave. > --- > > Key: HADOOP-1626 > URL: https://issues.apache.org/jira/browse/HADOOP-1626 > Project: Hadoop > Issue Type: Improvement > Components: dfs >Affects Versions: 0.13.0 >Reporter: Konstantin Shvachko >Priority: Blocker > Fix For: 0.15.0 > > Attachments: HADOOP-1626.2.patch, HADOOP-1626.patch > > > DFSAdmin.printHelp() does not print help for the two commands above. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1626) DFSAdmin. Help messages are missing for -finalizeUpgrade and -metasave.
[ https://issues.apache.org/jira/browse/HADOOP-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1626: - Attachment: HADOOP-1626.patch Attached is a patch which includes help message for -finalizeUpgrade and -metasave Below is the help message. [EMAIL PROTECTED] hadoop-trunk]$ hadoop dfsadmin -help finalizeUpgrade -finalizeUpgrade: Finalize upgrade of DFS. Datanodes delete their old version working directories, followed by namenode removing its own and checkpoints in it. This completes the upgrade process. [EMAIL PROTECTED] hadoop-trunk]$ hadoop dfsadmin -help metasave -metasave : Save namenode's primary data structures to in directory specified by hadoop.log.dir property. contains one line for each of the following 1. Datanodes heart beating with Namenode 2. Blocks waiting to be replicated 3. Blocks currrently being replicated 4. Blocls waiting to be deleted [EMAIL PROTECTED] hadoop-trunk]$ > DFSAdmin. Help messages are missing for -finalizeUpgrade and -metasave. > --- > > Key: HADOOP-1626 > URL: https://issues.apache.org/jira/browse/HADOOP-1626 > Project: Hadoop > Issue Type: Improvement > Components: dfs >Affects Versions: 0.13.0 >Reporter: Konstantin Shvachko >Priority: Blocker > Fix For: 0.15.0 > > Attachments: HADOOP-1626.patch > > > DFSAdmin.printHelp() does not print help for the two commands above. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1977) hadoop job -kill , -status causes NullPointerException
[ https://issues.apache.org/jira/browse/HADOOP-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531709 ] lohit vijayarenu commented on HADOOP-1977: -- Hi Enis, I tried your patch on 0.14.1 and both -status and -kill works. Thanks! > hadoop job -kill , -status causes NullPointerException > -- > > Key: HADOOP-1977 > URL: https://issues.apache.org/jira/browse/HADOOP-1977 > Project: Hadoop > Issue Type: Bug > Components: mapred >Affects Versions: 0.14.1 >Reporter: lohit vijayarenu >Assignee: Enis Soztutar >Priority: Blocker > Fix For: 0.14.2 > > Attachments: NPEinJobClient_v1.hadoop-0.14.patch, > NPEinJobClient_v1.patch > > > hadoop job -kill/-status seems to cause NullPointerException > As an example, I started a streaming job and tried to kill it. This raises > NullPointerException > [EMAIL PROTECTED] mapred]$ bin/hadoop job -Dmapred.job.tracker=kry1443:56225 > -kill job_200710011856_0001 > 07/10/01 18:57:07 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.mapred.LocalJobRunner$Job.access$600(LocalJobRunner.java:51) > at > org.apache.hadoop.mapred.LocalJobRunner.getJobStatus(LocalJobRunner.java:296) > at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:512) > at org.apache.hadoop.mapred.JobClient.run(JobClient.java:791) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) > at org.apache.hadoop.mapred.JobClient.main(JobClient.java:827) > [EMAIL PROTECTED] mapred]$ > So does 'hadoop job -status' > [EMAIL PROTECTED] mapred]$hadoop job -Dmapred.job.tracker=kry1443:56225 > -status job_200710011856_0001 > 07/10/01 18:57:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.mapred.LocalJobRunner$Job.access$600(LocalJobRunner.java:51) > at > org.apache.hadoop.mapred.LocalJobRunner.getJobStatus(LocalJobRunner.java:296) > at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:512) > at org.apache.hadoop.mapred.JobClient.run(JobClient.java:782) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) > at org.apache.hadoop.mapred.JobClient.main(JobClient.java:827) > [EMAIL PROTECTED] mapred]$ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1977) hadoop job -kill , -status causes NullPointerException
[ https://issues.apache.org/jira/browse/HADOOP-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1977: - Component/s: mapred Summary: hadoop job -kill , -status causes NullPointerException (was: hadoop job -kill , -status ) > hadoop job -kill , -status causes NullPointerException > -- > > Key: HADOOP-1977 > URL: https://issues.apache.org/jira/browse/HADOOP-1977 > Project: Hadoop > Issue Type: Bug > Components: mapred >Affects Versions: 0.14.1 >Reporter: lohit vijayarenu > Fix For: 0.14.2 > > > hadoop job -kill/-status seems to cause NullPointerException > As an example, I started a streaming job and tried to kill it. This raises > NullPointerException > [EMAIL PROTECTED] mapred]$ bin/hadoop job -Dmapred.job.tracker=kry1443:56225 > -kill job_200710011856_0001 > 07/10/01 18:57:07 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.mapred.LocalJobRunner$Job.access$600(LocalJobRunner.java:51) > at > org.apache.hadoop.mapred.LocalJobRunner.getJobStatus(LocalJobRunner.java:296) > at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:512) > at org.apache.hadoop.mapred.JobClient.run(JobClient.java:791) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) > at org.apache.hadoop.mapred.JobClient.main(JobClient.java:827) > [EMAIL PROTECTED] mapred]$ > So does 'hadoop job -status' > [EMAIL PROTECTED] mapred]$hadoop job -Dmapred.job.tracker=kry1443:56225 > -status job_200710011856_0001 > 07/10/01 18:57:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.mapred.LocalJobRunner$Job.access$600(LocalJobRunner.java:51) > at > org.apache.hadoop.mapred.LocalJobRunner.getJobStatus(LocalJobRunner.java:296) > at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:512) > at org.apache.hadoop.mapred.JobClient.run(JobClient.java:782) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) > at org.apache.hadoop.mapred.JobClient.main(JobClient.java:827) > [EMAIL PROTECTED] mapred]$ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1977) hadoop job -kill , -status causes NullPointerException
[ https://issues.apache.org/jira/browse/HADOOP-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1977: - Priority: Blocker (was: Major) > hadoop job -kill , -status causes NullPointerException > -- > > Key: HADOOP-1977 > URL: https://issues.apache.org/jira/browse/HADOOP-1977 > Project: Hadoop > Issue Type: Bug > Components: mapred >Affects Versions: 0.14.1 >Reporter: lohit vijayarenu >Priority: Blocker > Fix For: 0.14.2 > > > hadoop job -kill/-status seems to cause NullPointerException > As an example, I started a streaming job and tried to kill it. This raises > NullPointerException > [EMAIL PROTECTED] mapred]$ bin/hadoop job -Dmapred.job.tracker=kry1443:56225 > -kill job_200710011856_0001 > 07/10/01 18:57:07 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.mapred.LocalJobRunner$Job.access$600(LocalJobRunner.java:51) > at > org.apache.hadoop.mapred.LocalJobRunner.getJobStatus(LocalJobRunner.java:296) > at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:512) > at org.apache.hadoop.mapred.JobClient.run(JobClient.java:791) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) > at org.apache.hadoop.mapred.JobClient.main(JobClient.java:827) > [EMAIL PROTECTED] mapred]$ > So does 'hadoop job -status' > [EMAIL PROTECTED] mapred]$hadoop job -Dmapred.job.tracker=kry1443:56225 > -status job_200710011856_0001 > 07/10/01 18:57:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.mapred.LocalJobRunner$Job.access$600(LocalJobRunner.java:51) > at > org.apache.hadoop.mapred.LocalJobRunner.getJobStatus(LocalJobRunner.java:296) > at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:512) > at org.apache.hadoop.mapred.JobClient.run(JobClient.java:782) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) > at org.apache.hadoop.mapred.JobClient.main(JobClient.java:827) > [EMAIL PROTECTED] mapred]$ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-1977) hadoop job -kill , -status
hadoop job -kill , -status --- Key: HADOOP-1977 URL: https://issues.apache.org/jira/browse/HADOOP-1977 Project: Hadoop Issue Type: Bug Affects Versions: 0.14.1 Reporter: lohit vijayarenu Fix For: 0.14.2 hadoop job -kill/-status seems to cause NullPointerException As an example, I started a streaming job and tried to kill it. This raises NullPointerException [EMAIL PROTECTED] mapred]$ bin/hadoop job -Dmapred.job.tracker=kry1443:56225 -kill job_200710011856_0001 07/10/01 18:57:07 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.access$600(LocalJobRunner.java:51) at org.apache.hadoop.mapred.LocalJobRunner.getJobStatus(LocalJobRunner.java:296) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:512) at org.apache.hadoop.mapred.JobClient.run(JobClient.java:791) at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) at org.apache.hadoop.mapred.JobClient.main(JobClient.java:827) [EMAIL PROTECTED] mapred]$ So does 'hadoop job -status' [EMAIL PROTECTED] mapred]$hadoop job -Dmapred.job.tracker=kry1443:56225 -status job_200710011856_0001 07/10/01 18:57:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.access$600(LocalJobRunner.java:51) at org.apache.hadoop.mapred.LocalJobRunner.getJobStatus(LocalJobRunner.java:296) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:512) at org.apache.hadoop.mapred.JobClient.run(JobClient.java:782) at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) at org.apache.hadoop.mapred.JobClient.main(JobClient.java:827) [EMAIL PROTECTED] mapred]$ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-1967) hadoop dfs -ls, -get, -mv command's source/destination URI are inconsistent
hadoop dfs -ls, -get, -mv command's source/destination URI are inconsistent --- Key: HADOOP-1967 URL: https://issues.apache.org/jira/browse/HADOOP-1967 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.14.1 Reporter: lohit vijayarenu While specifying source/destination path for hadoop dfs -ls, -get, -mv, -cp commands, we have some inconsistency related to 'hdfs://' scheme. Particularly, few of the commands accept both formats [1] hdfs:///user/lohit/testfile [2] hdfs://myhost:8020/user/lohit/testfile and few commands accept only paths, which have authority (host:port) [2] hdfs://myhost:8020/user/lohit/testfile below are examples hadoop dfs -ls (works for both formats) {quote} [EMAIL PROTECTED] ~]$ hadoop dfs -ls hdfs://kry-nn1:8020/user/lohit/ranges Found 1 items /user/lohit/ranges 24 1970-01-01 00:00 [EMAIL PROTECTED] ~]$ hadoop dfs -ls hdfs:///user/lohit/ranges Found 1 items {quote} hadoop dfs -get (works for only format [2]) {quote} [EMAIL PROTECTED] ~]$ hadoop dfs -get hdfs:///user/lohit/ranges . Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs:/user/lohit/ranges, expected: hdfs://kry-nn1:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:204) at org.apache.hadoop.dfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:108) at org.apache.hadoop.dfs.DistributedFileSystem.getPath(DistributedFileSystem.java:104) at org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:319) at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:423) at org.apache.hadoop.fs.FsShell.copyToLocal(FsShell.java:177) at org.apache.hadoop.fs.FsShell.copyToLocal(FsShell.java:155) at org.apache.hadoop.fs.FsShell.run(FsShell.java:1233) at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1342) [EMAIL PROTECTED] ~]$ hadoop dfs -get hdfs://kry-nn1:8020/user/lohit/ranges . [EMAIL PROTECTED] ~]$ ls ./ranges ./ranges [EMAIL PROTECTED] ~]$ {quote} hadoop dfs -mv / -cp command. source path accepts both format [1] and [2], while destination accepts only [2]. {quote} [EMAIL PROTECTED] ~]$ hadoop dfs -cp hdfs://kry-nn1:8020/user/lohit/ranges.test2 hdfs:///user/lohit/ranges.test Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs:/user/lohit/ranges.test, expected: hdfs://kry-nn1:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:204) at org.apache.hadoop.dfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:108) at org.apache.hadoop.dfs.DistributedFileSystem.getPath(DistributedFileSystem.java:104) at org.apache.hadoop.dfs.DistributedFileSystem.exists(DistributedFileSystem.java:162) at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:269) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:117) at org.apache.hadoop.fs.FsShell.copy(FsShell.java:691) at org.apache.hadoop.fs.FsShell.copy(FsShell.java:727) at org.apache.hadoop.fs.FsShell.run(FsShell.java:1260) at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1342) [EMAIL PROTECTED] ~]$ hadoop dfs -cp hdfs:///user/lohit/ranges.test2 hdfs://kry-nn1:8020/user/lohit/ranges.test [EMAIL PROTECTED] ~]$ {quote} We should have a consistent URI naming convention across all commands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1952) Streaming does not handle invalid -inputformat (typo by users for example)
[ https://issues.apache.org/jira/browse/HADOOP-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1952: - Attachment: CatchInvalidInputFormat.patch Attached is a simple patch to fix this. This patch also adds check for class.getSimpleName() while checking for inputformat class. In that case, users do not have to specify full class path for standard Class, instead they could just provide the simple class name. Thanks > Streaming does not handle invalid -inputformat (typo by users for example) > --- > > Key: HADOOP-1952 > URL: https://issues.apache.org/jira/browse/HADOOP-1952 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: lohit vijayarenu >Priority: Minor > Attachments: CatchInvalidInputFormat.patch > > > Hadoop Streaming does not handle invalid inputformat class. For example > -inputformat INVALID class would not be thrown as an error. Instead it > defaults to StreamInputFormat. If an invalid inputformat is specified, it is > good to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-1952) Streaming does not handle invalid -inputformat (typo by users for example)
Streaming does not handle invalid -inputformat (typo by users for example) --- Key: HADOOP-1952 URL: https://issues.apache.org/jira/browse/HADOOP-1952 Project: Hadoop Issue Type: Bug Components: contrib/streaming Affects Versions: 0.14.1 Reporter: lohit vijayarenu Priority: Minor Hadoop Streaming does not handle invalid inputformat class. For example -inputformat INVALID class would not be thrown as an error. Instead it defaults to StreamInputFormat. If an invalid inputformat is specified, it is good to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1210) Log counters in job history
[ https://issues.apache.org/jira/browse/HADOOP-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1210: - Affects Version/s: 0.14.0 > Log counters in job history > --- > > Key: HADOOP-1210 > URL: https://issues.apache.org/jira/browse/HADOOP-1210 > Project: Hadoop > Issue Type: Improvement > Components: mapred >Affects Versions: 0.14.0 >Reporter: Albert Chern >Priority: Minor > Attachments: counters_output, h-1210-job-and-task-status.patch, > h-1210-job-and-task-status.patch, h-1210-jobstatus.patch > > > It would be useful if the value of the global counters were logged to the job > history, perhaps even individually for each task after completion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-1781) Need more complete API of JobClient class
[ https://issues.apache.org/jira/browse/HADOOP-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522652 ] lohit edited comment on HADOOP-1781 at 8/24/07 2:30 PM: --- (HADOOP-1210) Similar issue, where for now we are trying to dump details of tasks to stdout via hadoop job -status [-taskdetails] option was (Author: lohit): Similar issue, where for now we are trying to dump details of tasks to stdout via hadoop job -status [-taskdetails] option > Need more complete API of JobClient class > - > > Key: HADOOP-1781 > URL: https://issues.apache.org/jira/browse/HADOOP-1781 > Project: Hadoop > Issue Type: Improvement > Components: mapred >Reporter: Runping Qi > > We need a programmatic way to find out the information about a map/reduce > cluster and the jobs on the cluster. > The current API is not complete. > In particular, the following API functions are needed: > 1. jobs() currently, there is an API function JobsToComplete, which returns > running/waiting jobs only. jobs() should return the complete list. > 2. TaskReport[] getMap/ReduceTaskReports(String jobid) > 3. getStartTime() > 4. getJobStatus(String jobid); > 5. getJobProfile(String jobid); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1210) Log counters in job history
[ https://issues.apache.org/jira/browse/HADOOP-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1210: - Attachment: h-1210-job-and-task-status.patch Thanks Doug. Even Milind suggested the same and here is an updated patch. Instead of -counters the patch has modified it to -taskdetails. If provided it would dump start/end/wall time along with counters. This time in format which could be parsed easily. > Log counters in job history > --- > > Key: HADOOP-1210 > URL: https://issues.apache.org/jira/browse/HADOOP-1210 > Project: Hadoop > Issue Type: Improvement > Components: mapred >Reporter: Albert Chern >Priority: Minor > Attachments: counters_output, h-1210-job-and-task-status.patch, > h-1210-job-and-task-status.patch, h-1210-jobstatus.patch > > > It would be useful if the value of the global counters were logged to the job > history, perhaps even individually for each task after completion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1210) Log counters in job history
[ https://issues.apache.org/jira/browse/HADOOP-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1210: - Attachment: h-1210-job-and-task-status.patch Updated patch included. > Log counters in job history > --- > > Key: HADOOP-1210 > URL: https://issues.apache.org/jira/browse/HADOOP-1210 > Project: Hadoop > Issue Type: Improvement > Components: mapred >Reporter: Albert Chern >Priority: Minor > Attachments: counters_output, h-1210-job-and-task-status.patch, > h-1210-jobstatus.patch > > > It would be useful if the value of the global counters were logged to the job > history, perhaps even individually for each task after completion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1210) Log counters in job history
[ https://issues.apache.org/jira/browse/HADOOP-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-1210: - Attachment: counters_output Thanks to Owen and Milind, made few more changes and added a new option hadoop job -status [-counters] If -counters option is specified, then the patch dumps Global Counters and Counters of each Map/Reduce task to STDOUT. I am attaching a sample output, have tried to organize it so that it could be parsed later by another program/script. > Log counters in job history > --- > > Key: HADOOP-1210 > URL: https://issues.apache.org/jira/browse/HADOOP-1210 > Project: Hadoop > Issue Type: Improvement > Components: mapred >Reporter: Albert Chern >Priority: Minor > Attachments: counters_output, h-1210-job-and-task-status.patch, > h-1210-jobstatus.patch > > > It would be useful if the value of the global counters were logged to the job > history, perhaps even individually for each task after completion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.