[jira] [Commented] (MAPREDUCE-3952) In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split.

2012-03-03 Thread Zhenxiao Luo (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221784#comment-13221784
 ] 

Zhenxiao Luo commented on MAPREDUCE-3952:
-

@Bhallamudi

Yes. This is what I observed when running Hive testcases on both MR1 and MR2. 
This is kind of incompatible changes. Can we fix it so that MR2 has the same 
behavior as MR1?

@Ahmed

Yes. This is exactly this JIRA ticket for. Thanks so much.

 In MR2, when Total input paths to process == 1, 
 CombinefileInputFormat.getSplits() returns 0 split.
 ---

 Key: MAPREDUCE-3952
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3952
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Zhenxiao Luo

 Hive get unexpected result when using MR2(When using MR1, always get expected 
 result).
 In MR2, when Total input paths to process == 1, 
 CombinefileInputFormat.getSplits() returns 0 split.
 The calling code in Hive, in Hadoop23Shims.java:
 InputSplit[] splits = super.getSplits(job, numSplits);
 this get splits.length == 0.
 In MR1, everything goes fine, the calling code in Hive, in Hadoop20Shims.java:
 CombineFileSplit[] splits = (CombineFileSplit[]) super.getSplits(job, 
 numSplits);
 this get splits.length == 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3952) In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split.

2012-03-02 Thread Zhenxiao Luo (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221369#comment-13221369
 ] 

Zhenxiao Luo commented on MAPREDUCE-3952:
-

@Bhallamudi

Yes. Seems the input file is an empty file from execution log:


2012-02-28 15:56:37,219 INFO  exec.ExecDriver 
(ExecDriver.java:addInputPath(829)) - Changed input file to 
file:/tmp/cloudera/hive_2012-02-28_15-56-37_188_1216173472421796708/-mr-1/1
2012-02-28 15:56:37,226 INFO  util.NativeCodeLoader 
(NativeCodeLoader.java:clinit(50)) - Loaded the native-hadoop library
2012-02-28 15:56:37,610 INFO  jvm.JvmMetrics (JvmMetrics.java:init(76)) - 
Initializing JVM Metrics with processName=JobTracker, sessionId=
2012-02-28 15:56:37,626 INFO  exec.ExecDriver 
(ExecDriver.java:createTmpDirs(234)) - Making Temp Directory: 
file:/tmp/cloudera/hive_2012-02-28_15-56-26_431_554636048819260524/-mr-10003
2012-02-28 15:56:37,657 INFO  jvm.JvmMetrics (JvmMetrics.java:init(71)) - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2012-02-28 15:56:37,684 WARN  mapreduce.JobSubmitter 
(JobSubmitter.java:copyAndConfigureFiles(139)) - Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
2012-02-28 15:56:37,960 WARN  snappy.LoadSnappy (LoadSnappy.java:clinit(36)) 
- Snappy native library is available
2012-02-28 15:56:37,961 INFO  snappy.LoadSnappy (LoadSnappy.java:clinit(44)) 
- Snappy native library loaded
2012-02-28 15:56:37,969 INFO  io.CombineHiveInputFormat 
(CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating 
pool for 
file:/tmp/cloudera/hive_2012-02-28_15-56-37_188_1216173472421796708/-mr-1/1;
 using filter path 
file:/tmp/cloudera/hive_2012-02-28_15-56-37_188_1216173472421796708/-mr-1/1
2012-02-28 15:56:37,970 WARN  conf.Configuration 
(Configuration.java:handleDeprecation(326)) - mapred.min.split.size is 
deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2012-02-28 15:56:37,970 WARN  conf.Configuration 
(Configuration.java:handleDeprecation(326)) - mapred.min.split.size.per.node is 
deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
2012-02-28 15:56:37,971 WARN  conf.Configuration 
(Configuration.java:handleDeprecation(326)) - mapred.min.split.size.per.rack is 
deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
2012-02-28 15:56:37,971 WARN  conf.Configuration 
(Configuration.java:handleDeprecation(326)) - mapred.max.split.size is 
deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
2012-02-28 15:56:37,977 INFO  input.FileInputFormat 
(FileInputFormat.java:listStatus(245)) - Total input paths to process : 1
2012-02-28 15:56:37,982 INFO  io.CombineHiveInputFormat 
(CombineHiveInputFormat.java:getSplits(388)) - Arrays.asList iss
2012-02-28 15:56:37,982 INFO  io.CombineHiveInputFormat 
(CombineHiveInputFormat.java:getSplits(410)) - iss size: 0
2012-02-28 15:56:37,983 INFO  io.CombineHiveInputFormat 
(CombineHiveInputFormat.java:getSplits(417)) - number of splits 0

And, in MR1, the log looks like:

2012-02-28 14:09:54,554 INFO  exec.ExecDriver 
(ExecDriver.java:addInputPath(829)) - Changed input file to 
file:/tmp/cloudera/hive_2012-02-28_14-09-54_515_1377575814725676804/-mr-1/1
2012-02-28 14:09:54,855 INFO  jvm.JvmMetrics (JvmMetrics.java:init(71)) - 
Initializing JVM Metrics with processName=JobTracker, sessionId=
2012-02-28 14:09:54,871 INFO  exec.ExecDriver 
(ExecDriver.java:createTmpDirs(234)) - Making Temp Directory: 
file:/tmp/cloudera/hive_2012-02-28_14-09-44_700_3241431154033268523/-mr-10003
2012-02-28 14:09:54,881 WARN  mapred.JobClient 
(JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser 
for parsing the arguments. Applications should implement Tool for the same.
2012-02-28 14:09:55,037 INFO  io.CombineHiveInputFormat 
(CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating 
pool for 
file:/tmp/cloudera/hive_2012-02-28_14-09-54_515_1377575814725676804/-mr-1/1;
 using filter path 
file:/tmp/cloudera/hive_2012-02-28_14-09-54_515_1377575814725676804/-mr-1/1
2012-02-28 14:09:55,042 INFO  mapred.FileInputFormat 
(FileInputFormat.java:listStatus(192)) - Total input paths to process : 1
2012-02-28 14:09:55,056 INFO  io.CombineHiveInputFormat 
(CombineHiveInputFormat.java:getSplits(406)) - iss size: 1
2012-02-28 14:09:55,057 INFO  io.CombineHiveInputFormat 
(CombineHiveInputFormat.java:getSplits(409)) - adding inputSplitShim into 
result: 
Paths:/tmp/cloudera/hive_2012-02-28_14-09-54_515_1377575814725676804/-mr-1/1/emptyFile:0+0
 Locations:/default-rack:; InputFormatClass: 
org.apache.hadoop.mapred.TextInputFormat

2012-02-28 14:09:55,057 INFO  io.CombineHiveInputFormat 
(CombineHiveInputFormat.java:getSplits(413)) - number of splits 1

So, in MR1, submitting a 

[jira] [Commented] (MAPREDUCE-3952) In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split.

2012-03-02 Thread Zhenxiao Luo (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221391#comment-13221391
 ] 

Zhenxiao Luo commented on MAPREDUCE-3952:
-

@Ahmed

I think so.

Is it the case that, in MR1, an empty file get a split size of 1, and then the 
record reader will emit no records.

And in MR2, an empty file get a split size of 0?


 In MR2, when Total input paths to process == 1, 
 CombinefileInputFormat.getSplits() returns 0 split.
 ---

 Key: MAPREDUCE-3952
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3952
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Zhenxiao Luo

 Hive get unexpected result when using MR2(When using MR1, always get expected 
 result).
 In MR2, when Total input paths to process == 1, 
 CombinefileInputFormat.getSplits() returns 0 split.
 The calling code in Hive, in Hadoop23Shims.java:
 InputSplit[] splits = super.getSplits(job, numSplits);
 this get splits.length == 0.
 In MR1, everything goes fine, the calling code in Hive, in Hadoop20Shims.java:
 CombineFileSplit[] splits = (CombineFileSplit[]) super.getSplits(job, 
 numSplits);
 this get splits.length == 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira