[ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099716#comment-14099716
 ] 

Zhichun Wu commented on HIVE-4997:
----------------------------------

@ [~dintskirveli] :

Your approach tries to attach each InputInfo to InputSplit in 
HCatDelegatingInputFormat#getSplits, and generate InputJobInfo in 
HCatDelegatingInputFormat#createRecordReader with the inputInfo attached. It 
has to query hive metastore service when generating InputJobInfo in each map , 
so I think it may have an impact on metastore service when the maps are huge. 
Also when we setup an security hadoop cluster, each map has to acquire a 
delegation token in order to access metastore service. The current patch hasn't 
take this part into consideration.

Here I think we can generate each InputJobInfo every time we add a table and 
then we can serialize and attach Array<InputJobInfo> to job conf, we can fetch 
each inputJobInfo from job conf in getSplits and createRecordReader. This will 
avoid query metastore service in map phase. I've change the usage of adding 
multiple input tables as below:
{code}
 HCatMultipleInputs.init(job);
 HCatMultipleInputs.addInput(test_table1, "default", null, 
SequenceMapper.class);
 HCatMultipleInputs.addInput(test_table2, null, "part='1'", TextMapper1.class);
 HCatMultipleInputs.addInput(test_table2, null, "part='2'", TextMapper2.class);
 HCatMultipleInputs.build();
{code}

I've upload HIVE-4997.4.patch which based on HIVE-4997.3.patch. It works on our 
security hadoop 2.2.0 cluster.  It just works and I upload it for demonstrate 
the idea. I haven't put much thought into the quality of code and the design of 
this new feature.

 

> HCatalog doesn't allow multiple input tables
> --------------------------------------------
>
>                 Key: HIVE-4997
>                 URL: https://issues.apache.org/jira/browse/HIVE-4997
>             Project: Hive
>          Issue Type: Improvement
>          Components: HCatalog
>    Affects Versions: 0.13.0
>            Reporter: Daniel Intskirveli
>             Fix For: 0.14.0
>
>         Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch
>
>
> HCatInputFormat does not allow reading from multiple hive tables in the same 
> MapReduce job. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to