[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-10-06 Thread Sreekanth Ramakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918490#action_12918490
 ] 

Sreekanth Ramakrishnan commented on HIVE-1633:
--

I was taking a look at reproducing the issue. The core reason why the exception 
is present is due to following.

* Input format is passed a set of input path.
* These set of path contains two kind of files, table data files and 
scratch/tmp files which are created by hive in hdfs.
* CombineHiveInputFormat tries to compute splits in these temp/scratch file, 
which causes the  getPartitionDescFromPathRecursively to fail. Causing the 
query to fail.

I hope this helps, I am still looking at the code, and trying to figure out 
where the actual addition to input paths are done. So basically I can back 
track from there. Any help on this would be great.



> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-21 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913230#action_12913230
 ] 

He Yongqiang commented on HIVE-1633:


Amareshwari, by adding a testcase in TestHiveFileFormatUtils, you will be able 
to find out the underlying problem, and then can you post a patch for it?

> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-20 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912795#action_12912795
 ] 

He Yongqiang commented on HIVE-1633:


For a given path, CombineHiveInputFormat does recursive lookup in 
partToPartitionInfo. If no match found, will lookup for the parent dir 
("hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1") 
in partToPartitionInfo. In your case, it seems the parent dir exist in 
partToPartitionInfo. 

> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-16 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910435#action_12910435
 ] 

Amareshwari Sriramadasu commented on HIVE-1633:
---

Sorry If I misunderstood your comment. I looked for 
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/ in 
partToPartitionInfo shown in the exception. Only 
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/ 
appears. 
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile
 does not appear in partToPartitionInfo. 

> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-16 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910431#action_12910431
 ] 

He Yongqiang commented on HIVE-1633:


so 'xxx' part is not the same in 
"hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/" 
and 
"hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile"
?

> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-16 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910430#action_12910430
 ] 

Amareshwari Sriramadasu commented on HIVE-1633:
---

It appears only once as 
"hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/". 
there is no 
"hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile"

> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-16 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910255#action_12910255
 ] 

He Yongqiang commented on HIVE-1633:


Can you search 
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1 
(replacing xxx with actual file/host names)?

It should appear one time in partToPartitionInfo and another one time in 
"hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile".


> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-15 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910002#action_12910002
 ] 

Amareshwari Sriramadasu commented on HIVE-1633:
---

bq. I replaced the actual file names of xxx.
I meant " I replaced the actual file/host names with xxx"

> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-15 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910001#action_12910001
 ] 

Amareshwari Sriramadasu commented on HIVE-1633:
---

I replaced the actual file names of xxx, because actual file/host names are 
internal to our organization. But the problem is CombineHiveInputFormat is 
looking for PartitionDesc in 
"hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile" . This 
dir is not part of the table input data. I think this dir is getting added by 
FileSinkOperator. 

> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-15 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909711#action_12909711
 ] 

He Yongqiang commented on HIVE-1633:


@Amareshwari

in your example:
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile
in partToPartitionInfo:
[xxx..., xxx..., xxx..., ...
 hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1,
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/2]

If i put these into TestHiveFormatUtils, it can return correct value. Maybe 
there is some mismatch about 'xxx'?

> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-15 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909666#action_12909666
 ] 

Amareshwari Sriramadasu commented on HIVE-1633:
---

Sorry for the delay. 
The table has three partitions and 100 columns. It is stored as RCFile with 
compressed data.
The query we ran was "select count(\*) from " with 
CombineHiveInputFormat as the input format. We were trying to test 
MAPREDUCE-1597 by setting hive.hadoop.supports.splittable.combineinputformat to 
true. Queries ran fine with Text files.

> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908716#action_12908716
 ] 

He Yongqiang commented on HIVE-1633:


Amareshwari, more details about your example? From your example, i can not 
reproduce the problem.

> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1633) CombineHiveInputFormat fails with "cannot find dir for emptyFile"

2010-09-12 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908638#action_12908638
 ] 

Amareshwari Sriramadasu commented on HIVE-1633:
---

Here is full exception trace:
{noformat}
java.io.IOException: cannot find dir =
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1/emptyFile
in partToPartitionInfo:
[xxx..., xxx..., xxx..., ...
 hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/1,
hdfs://xxx/.../hive_2010-09-07_12-15-00_299_4877141498303008976/-mr-10002/2]
at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.(CombineHiveInputFormat.java:100)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:792)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1021)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:792)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:766)
at 
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610)
at 
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:900)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:770)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:647)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:199)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:353)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
{noformat}


> CombineHiveInputFormat fails with "cannot find dir for emptyFile"
> -
>
> Key: HIVE-1633
> URL: https://issues.apache.org/jira/browse/HIVE-1633
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Reporter: Amareshwari Sriramadasu
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.