[jira] [Commented] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099716#comment-14099716 ] Zhichun Wu commented on HIVE-4997: -- @ [~dintskirveli] : Your approach tries to attach each InputInfo to InputSplit in HCatDelegatingInputFormat#getSplits, and generate InputJobInfo in HCatDelegatingInputFormat#createRecordReader with the inputInfo attached. It has to query hive metastore service when generating InputJobInfo in each map , so I think it may have an impact on metastore service when the maps are huge. Also when we setup an security hadoop cluster, each map has to acquire a delegation token in order to access metastore service. The current patch hasn't take this part into consideration. Here I think we can generate each InputJobInfo every time we add a table and then we can serialize and attach Array to job conf, we can fetch each inputJobInfo from job conf in getSplits and createRecordReader. This will avoid query metastore service in map phase. I've change the usage of adding multiple input tables as below: {code} HCatMultipleInputs.init(job); HCatMultipleInputs.addInput(test_table1, "default", null, SequenceMapper.class); HCatMultipleInputs.addInput(test_table2, null, "part='1'", TextMapper1.class); HCatMultipleInputs.addInput(test_table2, null, "part='2'", TextMapper2.class); HCatMultipleInputs.build(); {code} I've upload HIVE-4997.4.patch which based on HIVE-4997.3.patch. It works on our security hadoop 2.2.0 cluster. It just works and I upload it for demonstrate the idea. I haven't put much thought into the quality of code and the design of this new feature. > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.0 >Reporter: Daniel Intskirveli > Fix For: 0.14.0 > > Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099701#comment-14099701 ] Hive QA commented on HIVE-4997: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662294/HIVE-4997.4.patch {color:green}SUCCESS:{color} +1 5818 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/362/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/362/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-362/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12662294 > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.0 >Reporter: Daniel Intskirveli > Fix For: 0.14.0 > > Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094026#comment-14094026 ] Zhichun Wu commented on HIVE-4997: -- HIVE-4997.3.patch use {code} JobContext ctx = new JobContext(conf, jobContext.getJobID()); {code} at line 57 in HCatDelegatingInputFormat , which is not compatible with hadoop-2, change it like below would be fine: {code} ShimLoader.getHadoopShims().getHCatShim().createJobContext(conf, jobContext.getJobID()); {code} > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.0 >Reporter: Daniel Intskirveli > Fix For: 0.14.0 > > Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994198#comment-13994198 ] Hive QA commented on HIVE-4997: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12610346/HIVE-4997.3.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/162/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/162/console Messages: {noformat} This message was trimmed, see log for full details [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-contrib --- [INFO] Compiling 2 source files to /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/test-classes [WARNING] Note: /data/hive-ptest/working/apache-svn-trunk-source/contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestRegexSerDe.java uses or overrides a deprecated API. [WARNING] Note: Recompile with -Xlint:deprecation for details. [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-contrib --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-contrib --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/hive-contrib-0.14.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-contrib --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-contrib --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/contrib/target/hive-contrib-0.14.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-contrib/0.14.0-SNAPSHOT/hive-contrib-0.14.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/contrib/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-contrib/0.14.0-SNAPSHOT/hive-contrib-0.14.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive HBase Handler 0.14.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-hbase-handler --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-hbase-handler --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-hbase-handler --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-hbase-handler --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-hbase-handler --- [INFO] Compiling 19 source files to /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/target/classes [WARNING] Note: Some input files use or override a deprecated API. [WARNING] Note: Recompile with -Xlint:deprecation for details. [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-hbase-handler --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-hbase-handler --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-hbase-handler --- [INFO] Compiling 4 source files to /data/hive-ptest/working/apache-svn-trunk-source/hbase-handler/target/test-classes [WARNING] Note: Some input files use or overr
[jira] [Commented] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806136#comment-13806136 ] Hive QA commented on HIVE-4997: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12610346/HIVE-4997.3.patch {color:green}SUCCESS:{color} +1 4456 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1248/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1248/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.0 >Reporter: Daniel Intskirveli > Fix For: 0.13.0 > > Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805753#comment-13805753 ] Sushanth Sowmyan commented on HIVE-4997: [~dintskirveli], as I mentioned in my previous comment, could you please attach a comment/design doc of sorts outlining need, goal, implementation and potential issues(if any). Since this is an interface change for HCat as a whole, we'd like to discuss whether this is the right thing to do. > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.0 >Reporter: Daniel Intskirveli > Fix For: 0.13.0 > > Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784275#comment-13784275 ] Sushanth Sowmyan commented on HIVE-4997: Rashad, I'm afraid it's a little late for feature improvements to ship with 0.12, which is in a lockdown mode for bugfixes only. There are two things that need to happen before this patch gets included in 0.13 : a) The patch needs regeneration to change all references of org.apache.hcatalog package to org.apache.hive.hcatalog package (The former package is deprecated, and will be maintained for only one more release before removal, and is considered frozen as of 0.11) b) If possible, since this is a pretty big feature(in that it adds functionality to the user-facing api), attach a design document with this patch outlining goal, implementation and potential issues(if any) and have it reviewed by another committer to see if that's okay. I'd suggest [~alangates] or [~toffer] since they've looked at MultiOutputFormat earlier and can make sure that it is consistent in design. After that, they can review the patch and commit. > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.12.0 >Reporter: Daniel Intskirveli >Priority: Minor > Fix For: 0.12.0 > > Attachments: HIVE-4997.1.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784073#comment-13784073 ] Rashad Tatum commented on HIVE-4997: >From what I see, Daniel added 4 files using the patch. Is there anything >blocking this from being included in Hive 0.12.0? Is there anything I can do >to help? > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.12.0 >Reporter: Daniel Intskirveli >Priority: Minor > Fix For: 0.12.0 > > Attachments: HIVE-4997.1.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13730111#comment-13730111 ] Hive QA commented on HIVE-4997: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12596206/HIVE-4997.1.patch {color:green}SUCCESS:{color} +1 2760 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/312/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/312/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.12.0 >Reporter: Daniel Intskirveli >Priority: Minor > Fix For: 0.12.0 > > Attachments: HIVE-4997.1.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira