[jira] [Commented] (MAPREDUCE-3193) NextGen Mapreduce framework is not able to read the job input recursively.Input is read only for one folder level deep

2011-10-19 Thread Devaraj K (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130382#comment-13130382
 ] 

Devaraj K commented on MAPREDUCE-3193:
--

bq. This isn't something we should change lightly, it's probably going to break 
user apps.

If the input path contains one nested dir, it is considering as file and trying 
to execute the task and it fails with the below error. Failing the job itself 
when the inputpath contains nested dir might not be correct. 

{code:xml}
Caused by: java.io.FileNotFoundException: File does not exist: /r1/r2
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:736)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:699)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:671)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:315)
at 
org.apache.hadoop.hdfs.protocolR23Compatible.ClientNamenodeProtocolServerSideTranslatorR23.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorR23.java:130)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:632)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1517)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1513)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1511)

at org.apache.hadoop.ipc.Client.call(Client.java:1085)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
at $Proxy8.getBlockLocations(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:130)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:81)
at $Proxy8.getBlockLocations(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolR23Compatible.ClientNamenodeProtocolTranslatorR23.getBlockLocations(ClientNamenodeProtocolTranslatorR23.java:150)
at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:566)
... 14 more
{code}

 NextGen Mapreduce framework is not able to read the job input 
 recursively.Input is read only for one folder level deep
 --

 Key: MAPREDUCE-3193
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3193
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramgopal N
Assignee: Devaraj K
 Attachments: MAPREDUCE-3193.patch


 java.io.FileNotFoundException is thrown,if input file is more than one folder 
 level deep and the job is getting failed.
 Example:Input file is /r1/r2/input.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3193) NextGen Mapreduce framework is not able to read the job input recursively.Input is read only for one folder level deep

2011-10-19 Thread Arun C Murthy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130384#comment-13130384
 ] 

Arun C Murthy commented on MAPREDUCE-3193:
--

bq. If the input path contains one nested dir, it is considering as file and 
trying to execute the task and it fails with the below error. Failing the job 
itself when the inputpath contains nested dir might not be correct.

Yes, I realize that.

FileInputFormat has this behaviour for a long while and changing it now (we 
definitely shouldn't do this for hadoop-0.23.0) will probably affect a lot of 
apps. Hence, at the very least, we need to have this off by default.

 NextGen Mapreduce framework is not able to read the job input 
 recursively.Input is read only for one folder level deep
 --

 Key: MAPREDUCE-3193
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3193
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramgopal N
Assignee: Devaraj K
 Attachments: MAPREDUCE-3193.patch


 java.io.FileNotFoundException is thrown,if input file is more than one folder 
 level deep and the job is getting failed.
 Example:Input file is /r1/r2/input.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3193) NextGen Mapreduce framework is not able to read the job input recursively.Input is read only for one folder level deep

2011-10-18 Thread Devaraj K (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129630#comment-13129630
 ] 

Devaraj K commented on MAPREDUCE-3193:
--

Here is the problem. In FileInputFormat.listStatus, It considers the 
files/directories in one nested level and takes every thing as file. Finally it 
creates splits with directories and fails the task.

{code:title=FileInputFormat.java|borderStyle=solid}
 for (int i=0; i  dirs.length; ++i) {
  Path p = dirs[i];
  FileSystem fs = p.getFileSystem(job.getConfiguration()); 
  FileStatus[] matches = fs.globStatus(p, inputFilter);
  if (matches == null) {
errors.add(new IOException(Input path does not exist:  + p));
  } else if (matches.length == 0) {
errors.add(new IOException(Input Pattern  + p +  matches 0 files));
  } else {
for (FileStatus globStat: matches) {
  if (globStat.isDirectory()) {
for(FileStatus stat: fs.listStatus(globStat.getPath(),
inputFilter)) {
  result.add(stat);
}  
  } else {
result.add(globStat);
  }
}
  }
}
{code}


 NextGen Mapreduce framework is not able to read the job input 
 recursively.Input is read only for one folder level deep
 --

 Key: MAPREDUCE-3193
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3193
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramgopal N

 java.io.FileNotFoundException is thrown,if input file is more than one folder 
 level deep and the job is getting failed.
 Example:Input file is /r1/r2/input.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3193) NextGen Mapreduce framework is not able to read the job input recursively.Input is read only for one folder level deep

2011-10-18 Thread Arun C Murthy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130372#comment-13130372
 ] 

Arun C Murthy commented on MAPREDUCE-3193:
--

This isn't something we should change lightly, it's probably going to break 
user apps.

At least, we need a config to turn this on, and it should be off by default.

 NextGen Mapreduce framework is not able to read the job input 
 recursively.Input is read only for one folder level deep
 --

 Key: MAPREDUCE-3193
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3193
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramgopal N
Assignee: Devaraj K
 Attachments: MAPREDUCE-3193.patch


 java.io.FileNotFoundException is thrown,if input file is more than one folder 
 level deep and the job is getting failed.
 Example:Input file is /r1/r2/input.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3193) NextGen Mapreduce framework is not able to read the job input recursively.Input is read only for one folder level deep

2011-10-17 Thread Mahadev konar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129001#comment-13129001
 ] 

Mahadev konar commented on MAPREDUCE-3193:
--

Ramgopal,
 What input format are you using?

 NextGen Mapreduce framework is not able to read the job input 
 recursively.Input is read only for one folder level deep
 --

 Key: MAPREDUCE-3193
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3193
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramgopal N

 java.io.FileNotFoundException is thrown,if input file is more than one folder 
 level deep and the job is getting failed.
 Example:Input file is /r1/r2/input.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3193) NextGen Mapreduce framework is not able to read the job input recursively.Input is read only for one folder level deep

2011-10-17 Thread Ramgopal N (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129481#comment-13129481
 ] 

Ramgopal N commented on MAPREDUCE-3193:
---

I have executed Wordcount job from examples.jar.It uses FileInputFormat

 NextGen Mapreduce framework is not able to read the job input 
 recursively.Input is read only for one folder level deep
 --

 Key: MAPREDUCE-3193
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3193
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramgopal N

 java.io.FileNotFoundException is thrown,if input file is more than one folder 
 level deep and the job is getting failed.
 Example:Input file is /r1/r2/input.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira