[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099716#comment-14099716 ]
Zhichun Wu commented on HIVE-4997: ---------------------------------- @ [~dintskirveli] : Your approach tries to attach each InputInfo to InputSplit in HCatDelegatingInputFormat#getSplits, and generate InputJobInfo in HCatDelegatingInputFormat#createRecordReader with the inputInfo attached. It has to query hive metastore service when generating InputJobInfo in each map , so I think it may have an impact on metastore service when the maps are huge. Also when we setup an security hadoop cluster, each map has to acquire a delegation token in order to access metastore service. The current patch hasn't take this part into consideration. Here I think we can generate each InputJobInfo every time we add a table and then we can serialize and attach Array<InputJobInfo> to job conf, we can fetch each inputJobInfo from job conf in getSplits and createRecordReader. This will avoid query metastore service in map phase. I've change the usage of adding multiple input tables as below: {code} HCatMultipleInputs.init(job); HCatMultipleInputs.addInput(test_table1, "default", null, SequenceMapper.class); HCatMultipleInputs.addInput(test_table2, null, "part='1'", TextMapper1.class); HCatMultipleInputs.addInput(test_table2, null, "part='2'", TextMapper2.class); HCatMultipleInputs.build(); {code} I've upload HIVE-4997.4.patch which based on HIVE-4997.3.patch. It works on our security hadoop 2.2.0 cluster. It just works and I upload it for demonstrate the idea. I haven't put much thought into the quality of code and the design of this new feature. > HCatalog doesn't allow multiple input tables > -------------------------------------------- > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog > Affects Versions: 0.13.0 > Reporter: Daniel Intskirveli > Fix For: 0.14.0 > > Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.2#6252)