[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918490#action_12918490 ] Sreekanth Ramakrishnan commented on HIVE-1633: -- I was taking a look at reproducing the issue. The core reason why the exception is present is due to following. * Input format is passed a set of input path. * These set of path contains two kind of files, table data files and scratch/tmp files which are created by hive in hdfs. * CombineHiveInputFormat tries to compute splits in these temp/scratch file, which causes the getPartitionDescFromPathRecursively to fail. Causing the query to fail. I hope this helps, I am still looking at the code, and trying to figure out where the actual addition to input paths are done. So basically I can back track from there. Any help on this would be great. > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913230#action_12913230 ] He Yongqiang commented on HIVE-1633: Amareshwari, by adding a testcase in TestHiveFileFormatUtils, you will be able to find out the underlying problem, and then can you post a patch for it? > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912795#action_12912795 ] He Yongqiang commented on HIVE-1633: For a given path, CombineHiveInputFormat does recursive lookup in partToPartitionInfo. If no match found, will lookup for the parent dir ("hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1") in partToPartitionInfo. In your case, it seems the parent dir exist in partToPartitionInfo. > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910435#action_12910435 ] Amareshwari Sriramadasu commented on HIVE-1633: --- Sorry If I misunderstood your comment. I looked for hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/ in partToPartitionInfo shown in the exception. Only hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/ appears. hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile does not appear in partToPartitionInfo. > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910431#action_12910431 ] He Yongqiang commented on HIVE-1633: so 'xxx' part is not the same in "hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/" and "hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile" ? > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910430#action_12910430 ] Amareshwari Sriramadasu commented on HIVE-1633: --- It appears only once as "hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/". there is no "hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile" > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910255#action_12910255 ] He Yongqiang commented on HIVE-1633: Can you search hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1 (replacing xxx with actual file/host names)? It should appear one time in partToPartitionInfo and another one time in "hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile". > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910002#action_12910002 ] Amareshwari Sriramadasu commented on HIVE-1633: --- bq. I replaced the actual file names of xxx. I meant " I replaced the actual file/host names with xxx" > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910001#action_12910001 ] Amareshwari Sriramadasu commented on HIVE-1633: --- I replaced the actual file names of xxx, because actual file/host names are internal to our organization. But the problem is CombineHiveInputFormat is looking for PartitionDesc in "hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile" . This dir is not part of the table input data. I think this dir is getting added by FileSinkOperator. > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909711#action_12909711 ] He Yongqiang commented on HIVE-1633: @Amareshwari in your example: hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile in partToPartitionInfo: [xxx..., xxx..., xxx..., ... hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1, hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/2] If i put these into TestHiveFormatUtils, it can return correct value. Maybe there is some mismatch about 'xxx'? > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909666#action_12909666 ] Amareshwari Sriramadasu commented on HIVE-1633: --- Sorry for the delay. The table has three partitions and 100 columns. It is stored as RCFile with compressed data. The query we ran was "select count(\*) from " with CombineHiveInputFormat as the input format. We were trying to test MAPREDUCE-1597 by setting hive.hadoop.supports.splittable.combineinputformat to true. Queries ran fine with Text files. > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908716#action_12908716 ] He Yongqiang commented on HIVE-1633: Amareshwari, more details about your example? From your example, i can not reproduce the problem. > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"
[ https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908638#action_12908638 ] Amareshwari Sriramadasu commented on HIVE-1633: --- Here is full exception trace: {noformat} java.io.IOException: cannot find dir = hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile in partToPartitionInfo: [xxx..., xxx..., xxx..., ... hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1, hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/2] at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.(CombineHiveInputFormat.java:100) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921) at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:792) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1021) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:792) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:766) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:900) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:770) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:647) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:353) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} > CombineHiveInputFormat fails with "cannot find dir for emptyFile" > - > > Key: HIVE-1633 > URL: https://issues.apache.org/jira/browse/HIVE-1633 > Project: Hadoop Hive > Issue Type: Bug > Components: Clients >Reporter: Amareshwari Sriramadasu > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.