[ 
https://issues.apache.org/jira/browse/HIVE-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634774#comment-14634774
 ] 

Harsh J commented on HIVE-11325:
--------------------------------

I missed the srcDir declare, which'd explain the loop (we're walking). I'm 
checking why it doesn't abort at the family directory.

> Infinite loop in HiveHFileOutputFormat
> --------------------------------------
>
>                 Key: HIVE-11325
>                 URL: https://issues.apache.org/jira/browse/HIVE-11325
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>    Affects Versions: 1.0.0
>            Reporter: Harsh J
>
> No idea why {{hbase_handler_bulk.q}} does not catch this if its being run 
> regularly in Hive builds, but here's the gist of the issue:
> The condition at 
> https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java#L152-L164
>  indicates that we will infinitely loop until we find a file whose last path 
> component (the name) is equal to the column family name.
> In execution, however, the iteration enters an actual infinite loop cause the 
> file we end up considering as the srcDir name, is actually the region file, 
> whose name will never match the family name.
> This is an example of the IPC the listing loop of a 100% progress task gets 
> stuck in:
> {code}
> 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> 1: Call -> cdh54.vm/172.16.29.132:8020: getListing {src: 
> "/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_000000_0/family/97112ac1c09548ae87bd85af072d2e8c"
>  startAfter: "" needLocation: false}
> 2015-07-21 10:32:20,662 DEBUG [IPC Parameter Sending Thread #1] 
> org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to 
> cdh54.vm/172.16.29.132:8020 from hive sending #510346
> 2015-07-21 10:32:20,662 DEBUG [IPC Client (1551465414) connection to 
> cdh54.vm/172.16.29.132:8020 from hive] org.apache.hadoop.ipc.Client: IPC 
> Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive got 
> value #510346
> 2015-07-21 10:32:20,662 DEBUG [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> Call: getListing took 0ms
> 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> 1: Response <- cdh54.vm/172.16.29.132:8020: getListing {dirList { 
> partialListing { fileType: IS_FILE path: "" length: 863 permission { perm: 
> 4600 } owner: "hive" group: "hive" modification_time: 1437454718130 
> access_time: 1437454717973 block_replication: 1 blocksize: 134217728 fileId: 
> 33960 childrenNum: 0 storagePolicy: 0 } remainingEntries: 0 }}
> {code}
> The path we are getting out of the listing results is 
> {{/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_000000_0/family/97112ac1c09548ae87bd85af072d2e8c}},
>  but instead of checking the path's parent {{family}} we're instead looping 
> infinitely over its hashed filename {{97112ac1c09548ae87bd85af072d2e8c}} 
> cause it does not match {{family}}.
> It stays in the infinite loop therefore, until the MR framework kills it away 
> due to an idle task timeout (and then since the subsequent task attempts fail 
> outright, the job fails).
> While doing a {{getPath().getParent()}} will resolve that, is that infinite 
> loop even necessary? Especially given the fact that we throw exceptions if 
> there are no entries or there is more than one entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to