[jira] [Updated] (HIVE-4790) MapredLocalTask task does not make virtual columns
[ https://issues.apache.org/jira/browse/HIVE-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4790: -- Attachment: HIVE-4790.D11511.1.patch navis requested code review of HIVE-4790 [jira] MapredLocalTask task does not make virtual columns. Reviewers: JIRA DPAL-4790 MapredLocalTask task does not make virtual columns From mailing list, http://www.mail-archive.com/user@hive.apache.org/msg08264.html SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number; fails with this error: SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number; Automatically selecting local only mode for query Total MapReduce jobs = 1 setting HADOOP_USER_NAMEpmarron 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Execution log at: /tmp/pmarron/.log 2013-06-25 10:52:56 Starting to launch local task to process map join; maximum memory = 932118528 java.lang.RuntimeException: cannot find field block__offset__inside__file from [0:rownumber, 1:offset] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366) at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168) at org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) at org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68) at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Execution failed with exit status: 2 TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D11511 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsWork.java ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java ql/src/test/queries/clientpositive/join_vc.q ql/src/test/results/clientpositive/join_vc.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/27237/ To: JIRA, navis MapredLocalTask task does not make virtual columns -- Key: HIVE-4790 URL: https://issues.apache.org/jira/browse/HIVE-4790 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4790.D11511.1.patch From mailing list, http://www.mail-archive.com/user@hive.apache.org/msg08264.html {noformat} SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number; fails with this error: SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number; Automatically selecting local only mode for query Total MapReduce jobs = 1 setting HADOOP_USER_NAMEpmarron 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no
[jira] [Commented] (HIVE-4290) Build profiles: Partial builds for quicker dev
[ https://issues.apache.org/jira/browse/HIVE-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693807#comment-13693807 ] Gunther Hagleitner commented on HIVE-4290: -- Test came back clean for me. I think .2 is ready. Build profiles: Partial builds for quicker dev -- Key: HIVE-4290 URL: https://issues.apache.org/jira/browse/HIVE-4290 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-4290.2.patch, HIVE-4290.D11481.1.patch, HIVE-4290.patch Building is definitely taking longer with hcat, hs2 etc in the build. When you're working on one area of the system though, it would be easier to have an option to only build that. Not for pre-commit or build machines, but for dev this should help. ant clean package build OR ant -Dbuild.profile=full clean package test -- build everything ant -Dbuild.profile=core clean package test -- build just enough to run the tests in ql ant -Dbuild.profile=hcat clean package test -- build only hcatalog -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 12100: Patch to fix HIVE-4789
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12100/#review22403 --- Ship it! Ship It! - Ben Spivey On June 26, 2013, 5:56 a.m., Sean Busbey wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12100/ --- (Updated June 26, 2013, 5:56 a.m.) Review request for hive, Ashutosh Chauhan, Jakob Homan, and Mark Wagner. Repository: hive Description --- HIVE-3953 fixed using partitioned avro tables for anything that used the MapOperator, but those that rely on FetchOperator still fail with the same error. e.g. SELECT * FROM partitioned_avro LIMIT 5; SELECT * FROM partitioned_avro WHERE partition_col=value; Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1496728 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 1496728 trunk/ql/src/test/queries/clientpositive/avro_partitioned.q 1496728 trunk/ql/src/test/results/clientpositive/avro_partitioned.q.out 1496728 Diff: https://reviews.apache.org/r/12100/diff/ Testing --- reran avro partition unit tests and partition_wise_fileformat*.q Thanks, Sean Busbey
[jira] [Updated] (HIVE-2269) Hive --auxpath option can't handle multiple colon separated values
[ https://issues.apache.org/jira/browse/HIVE-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Spiegel updated HIVE-2269: --- Affects Version/s: 0.10.0 Hive --auxpath option can't handle multiple colon separated values -- Key: HIVE-2269 URL: https://issues.apache.org/jira/browse/HIVE-2269 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.7.0, 0.7.1, 0.10.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: HIVE-2269-auxpath.1.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4791) improve test coverage of package org.apache.hadoop.hive.ql.udf.xml
Ivan A. Veselovsky created HIVE-4791: Summary: improve test coverage of package org.apache.hadoop.hive.ql.udf.xml Key: HIVE-4791 URL: https://issues.apache.org/jira/browse/HIVE-4791 Project: Hive Issue Type: Test Reporter: Ivan A. Veselovsky Assignee: Ivan A. Veselovsky improve test coverage of package org.apache.hadoop.hive.ql.udf.xml to 80%. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4791) improve test coverage of package org.apache.hadoop.hive.ql.udf.xml
[ https://issues.apache.org/jira/browse/HIVE-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694142#comment-13694142 ] Edward Capriolo commented on HIVE-4791: --- How are you counting test coverage. The automated tools like cobertura do not 'understand' our *.q test format. Thus we have more coverage then these tools indicate. Maybe we can thing of a clever way to compile and run the q tests so we can see the true coverage. improve test coverage of package org.apache.hadoop.hive.ql.udf.xml -- Key: HIVE-4791 URL: https://issues.apache.org/jira/browse/HIVE-4791 Project: Hive Issue Type: Test Reporter: Ivan A. Veselovsky Assignee: Ivan A. Veselovsky improve test coverage of package org.apache.hadoop.hive.ql.udf.xml to 80%. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4719) EmbeddedLockManager should be shared to all clients
[ https://issues.apache.org/jira/browse/HIVE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694240#comment-13694240 ] Phabricator commented on HIVE-4719: --- brock has commented on the revision HIVE-4719 [jira] EmbeddedLockManager should be shared to all clients. Navis, The patch looks good to me. I think the issue where creation of the factory/manager can be taken forward in a follow JIRA since it's not related to the patch itself! Cheers! Brock INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/Driver.java:143 Ahh interesting! I don't think Hive should continue on if it cannot create a lock manager but is configured to use concurrency. However, I think we can handle this on a follow on JIRA. REVISION DETAIL https://reviews.facebook.net/D11229 To: JIRA, navis Cc: brock EmbeddedLockManager should be shared to all clients --- Key: HIVE-4719 URL: https://issues.apache.org/jira/browse/HIVE-4719 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-4719.D11229.1.patch, HIVE-4719.D11229.2.patch Currently, EmbeddedLockManager is created per Driver instance, so locking has no meaning. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4724) ORC readers should have a better error detection for non-ORC files
[ https://issues.apache.org/jira/browse/HIVE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4724: -- Attachment: HIVE-4724.D11529.1.patch omalley requested code review of HIVE-4724 [jira] ORC readers should have a better error detection for non-ORC files. Reviewers: JIRA Add better checks for non-ORC files in the ORC reader to fail with a better error message. Also add a version check to warn users if they are reading files from a more advanaced version of Hadoop. Added a check so that unknown encodings for a column will fail quickly with a good error message. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D11529 AFFECTED FILES .gitignore .idea/.name .idea/ant.xml .idea/codeStyleSettings.xml .idea/compiler.xml .idea/copyright/Apache.xml .idea/copyright/profiles_settings.xml .idea/encodings.xml .idea/libraries/default.xml .idea/libraries/hadoop0_20S_shim.xml .idea/libraries/hadoop0_20_shim.xml .idea/libraries/hadoop0_23.xml .idea/misc.xml .idea/modules.xml .idea/scopes/scope_settings.xml .idea/uiDesigner.xml .idea/vcs.xml .idea/workspace.xml ant/Ant.iml builtins/Builtins.iml cli/src/Cli.iml common/src/Common.iml contrib/src/Contrib.iml hbase-handler/src/Hbase-handler.iml hwi/src/Hwi.iml jdbc/src/Jdbc.iml metastore/src/Metastore.iml metastore/src/gen/thrift/Thrift.iml metastore/src/test/Metastore-test.iml pdk/src/Pdk.iml pdk/test-plugin/Test-plugin.iml ql/src/Ql.iml ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java ql/src/gen/thrift/Thrift1.iml ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto ql/src/test/Ql-test.iml serde/src/Serde.iml serde/src/gen/protobuf/Protobuf.iml serde/src/gen/thrift/Thrift2.iml serde/src/test/Serde-test.iml service/src/Service.iml service/src/gen/thrift/Thrift3.iml shims/src/0.20/Shims-0.20.iml shims/src/0.20S/Shims-0.20S.iml shims/src/0.23/Shims-0.23.iml shims/src/Shims.iml shims/src/common-secure/Shims-secure.iml shims/src/test/Shims-test.iml MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/27255/ To: JIRA, omalley ORC readers should have a better error detection for non-ORC files -- Key: HIVE-4724 URL: https://issues.apache.org/jira/browse/HIVE-4724 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-4724.D11529.1.patch A customer loaded a text file into a table that is stored as ORC. The error message was very unfriendly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 11326: HIVE-4588: Support session level hooks for HiveServer2
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11326/ --- (Updated June 27, 2013, 12:39 a.m.) Review request for hive. Changes --- Additional comments for the new classes/interfaces Bugs: HIVE-4588 https://issues.apache.org/jira/browse/HIVE-4588 Repository: hive-git Description --- Support session level hooks for HiveServer2 - New config parameter to define the hook - New hook context interface to pass the serssion user and config to the hook implementation - Session manager executes the configured hooks when a new session starts Diffs (updated) - beeline/src/java/org/apache/hive/beeline/Commands.java 3799cc1 beeline/src/test/org/apache/hive/beeline/src/test/TestBeeLineWithArgs.java 030f6b0 build-common.xml d642b51 cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java d9b7031 common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cc775d9 conf/hive-default.xml.template 5de5965 data/conf/hive-site.xml 4e6ff16 data/files/person c902284 hbase-handler/src/test/templates/TestHBaseCliDriver.vm c59e882 hbase-handler/src/test/templates/TestHBaseNegativeCliDriver.vm aaab85b hcatalog/bin/hcat 455f108 hcatalog/core/src/test/java/org/apache/hcatalog/cli/TestSemanticAnalysis.java d7a2b68 hcatalog/src/docs/src/documentation/content/xdocs/readerwriter.xml e36090e hcatalog/src/test/e2e/hcatalog/build.xml 8cf7407 hcatalog/src/test/e2e/hcatalog/drivers/TestDriverHiveCmdLine.pm 6154475 hcatalog/src/test/e2e/hcatalog/resource/default.res 01bfaee hcatalog/src/test/e2e/hcatalog/resource/windows.res 01bfaee hcatalog/src/test/e2e/hcatalog/tests/hcat.conf fa7893b hcatalog/src/test/e2e/hcatalog/tests/hive_cmdline.conf 91c0786 hcatalog/src/test/e2e/hcatalog/tests/hive_nightly.conf d026872 hcatalog/src/test/e2e/hcatalog/tools/test/floatpostprocessor.pl ec5de96 hcatalog/src/test/e2e/templeton/README.txt dac6ffc hcatalog/src/test/e2e/templeton/build.xml 4bce25b hcatalog/src/test/e2e/templeton/resource/default.res 01bfaee hcatalog/src/test/e2e/templeton/resource/windows.res 01bfaee jdbc/src/java/org/apache/hadoop/hive/jdbc/HivePreparedStatement.java 2859859 jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java 4c1ab3b jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 0e90fec jdbc/src/java/org/apache/hive/jdbc/HiveDatabaseMetaData.java 4cb1422 jdbc/src/java/org/apache/hive/jdbc/HiveDriver.java 2576914 jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java a7c432d jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java b142e8c jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java b108c7a metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 88151a1 ql/build.xml a34a079 ql/src/java/org/apache/hadoop/hive/ql/Context.java 5340e99 ql/src/java/org/apache/hadoop/hive/ql/Driver.java a5a867a ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java c796770 ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 6935738 ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java 854cd52 ql/src/java/org/apache/hadoop/hive/ql/exec/CopyTask.java 38d97e3 ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 295daab ql/src/java/org/apache/hadoop/hive/ql/exec/DependencyCollectionTask.java 9189cfc ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 11772e6 ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 5a00c2d ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeEvaluator.java 5cd9bde ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java b4da80c ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 6e9e0a8 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java b4b2c90 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java 988b389 ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java 6bbcb26 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java ac8e167 ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 90d93f6 ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java 092be6e ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java c737d7a ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 599f63c ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 17387a9 ql/src/java/org/apache/hadoop/hive/ql/exec/TaskRunner.java fcf9adc ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 68ec54a ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeTask.java 364fc19 ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java cbee423 ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java a1abf90 ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateTask.java a9cd8ac
[jira] [Updated] (HIVE-4588) Support session level hooks for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-4588: -- Attachment: HIVE-4588-2.patch Updated the patch with addional comments for new classes/interfaces Support session level hooks for HiveServer2 --- Key: HIVE-4588 URL: https://issues.apache.org/jira/browse/HIVE-4588 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-4588-1.patch, HIVE-4588-2.patch Support session level hooks for HiveSrver2. The configured hooks will get executed at beginning of each new session. This is useful for auditing connections, possibly tuning the session level properties etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694416#comment-13694416 ] Irwin commented on HIVE-3552: - I have tested for cubes and rollups, but failed. My table is:t1,formatted followes: The error message is: I have tried to use hive-0.10.0 and hive-0.11.0, and the error is same. Why I cannot use Enhanced Aggregation, Cube, Grouping and Rollup? Any one help? thanks! HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.11.0 Attachments: hive.3552.10.patch, hive.3552.11.patch, hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-2206: -- Attachment: HIVE-2206.D11097.11.patch yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query correlation discovery and optimization. I tested all unit tests before the commit of HIVE-4496. all unit tests pass Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D11097 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D11097?vs=35055id=35181#toc AFFECTED FILES build-common.xml data/files/leftsemijoin_mr_t1.txt data/files/leftsemijoin_mr_t2.txt ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java ql/src/test/queries/clientpositive/leftsemijoin_mr.q ql/src/test/results/clientpositive/leftsemijoin_mr.q.out To: JIRA, yhuai Cc: brock add a new optimizer for query correlation discovery and optimization Key: HIVE-2206 URL: https://issues.apache.org/jira/browse/HIVE-2206 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: He Yongqiang Assignee: Yin Huai Attachments: HIVE-2206.10-r1384442.patch.txt, HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, HIVE-2206.D11097.1.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, testQueries.2.q, YSmartPatchForHive.patch This issue proposes a new logical optimizer called Correlation Optimizer, which is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The paper and slides of YSmart are linked at the bottom. Since Hive translates queries in a sentence by sentence fashion, for every operation which may need to shuffle the data (e.g. join and aggregation operations), Hive will generate a MapReduce job for that operation. However, for those operations which may need to shuffle the data, they may involve correlations explained below and thus can be executed in a single MR job. # Input Correlation: Multiple MR jobs have input correlation (IC) if their input relation sets are not disjoint; # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they have not only input correlation, but also the same partition key; # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its child nodes if it has the same partition key as that child node. The current implementation of correlation optimizer only detect correlations among MR jobs for reduce-side join operators and reduce-side aggregation operators (not map only aggregation). A query will be optimized if it satisfies following conditions. # There exists a MR job for reduce-side join operator or reduce side aggregation operator which have JFC with all of its parents MR jobs (TCs will be also exploited if JFC exists); # All input tables of those correlated MR job are original input tables (not intermediate tables generated by sub-queries); and # No self join is involved in those correlated MR jobs. Correlation optimizer is implemented as a logical optimizer. The main reasons are that it only needs to manipulate the query plan tree and it can leverage the existing component on generating MR jobs. Current implementation can serve as a framework for correlation related optimizations. I think that it is better than adding individual optimizers. There are several work that can be done in future to improve this optimizer. Here are three examples. # Support queries only involve TC; # Support queries in which input tables of correlated MR jobs involves intermediate tables; and # Optimize queries involving self join. References: Paper and presentation of YSmart. Paper: http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf Slides: http://sdrv.ms/UpwJJc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more
[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-2206: -- Attachment: HIVE-2206.D11097.12.patch yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query correlation discovery and optimization. My last diff was for 4718... Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D11097 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D11097?vs=35181id=35193#toc AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java conf/hive-default.xml.template ql/if/queryplan.thrift ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java ql/src/test/queries/clientpositive/correlationoptimizer1.q ql/src/test/queries/clientpositive/correlationoptimizer10.q ql/src/test/queries/clientpositive/correlationoptimizer2.q ql/src/test/queries/clientpositive/correlationoptimizer3.q ql/src/test/queries/clientpositive/correlationoptimizer4.q ql/src/test/queries/clientpositive/correlationoptimizer5.q ql/src/test/queries/clientpositive/correlationoptimizer6.q ql/src/test/queries/clientpositive/correlationoptimizer7.q ql/src/test/queries/clientpositive/correlationoptimizer8.q ql/src/test/queries/clientpositive/correlationoptimizer9.q ql/src/test/results/clientpositive/correlationoptimizer1.q.out ql/src/test/results/clientpositive/correlationoptimizer10.q.out ql/src/test/results/clientpositive/correlationoptimizer2.q.out ql/src/test/results/clientpositive/correlationoptimizer3.q.out ql/src/test/results/clientpositive/correlationoptimizer4.q.out ql/src/test/results/clientpositive/correlationoptimizer5.q.out ql/src/test/results/clientpositive/correlationoptimizer6.q.out ql/src/test/results/clientpositive/correlationoptimizer7.q.out ql/src/test/results/clientpositive/correlationoptimizer8.q.out ql/src/test/results/clientpositive/correlationoptimizer9.q.out ql/src/test/results/compiler/plan/groupby2.q.xml ql/src/test/results/compiler/plan/groupby3.q.xml To: JIRA, yhuai Cc: brock add a new optimizer for query correlation discovery and optimization Key: HIVE-2206 URL: https://issues.apache.org/jira/browse/HIVE-2206 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: He Yongqiang Assignee: Yin Huai Attachments: HIVE-2206.10-r1384442.patch.txt, HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt,
Does Hive 0.11 have Query Flattening optimizations?
Hello, Does hive support Query Flattening? For example a query like this: *SELECT alias.a0, alias.a1* *FROM* *(SELECT COUNT(b) AS a0, c AS a1* *FROM test* *GROUP BY c) alias* *WHERE alias.a0 2;* * * would be flattened into: *SELECT COUNT(b), c* *FROM test* *GROUP BY c* *HAVING COUNT(b) 2;* Does Hive (0.11) have such kind of optimizations or are they even useful considering all queries are ultimately converted into MapReduce jobs? At Informatica Corp we rely on Hive a lot and hence are interested to support such optimizations. Thanks in anticipation. Regards, *---* *Mihir Kulkarni **Software Engineer | Data Engine Informatica Corporation*
[jira] [Updated] (HIVE-4781) LEFT SEMI JOIN generates wrong results when the number of rows belonging to a single key of the right table exceed hive.join.emit.interval
[ https://issues.apache.org/jira/browse/HIVE-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4781: --- Description: Suppose that we have a query shown below {code:sql} SELECT key FROM t1 LEFT SEMI JOIN t2 ON (t1.key=t2.key); {\code} When the number of rows of t2 is larger than hive.join.emit.interval, JoinOperator will emit rows from t1, which will result in redundant output. Let's say t1 is {code} 1 {\code} and t2 is {code} 1 1 1 1 {\code} When hive.join.emit.interval=1, the output of above query will be {code} 1 1 1 1 {\code} The correct result should be {code} 1 {\code} This problem cannot be found in unit test. Because there is a GBY operator inserted before JoinOperator and we have only 1 mapper, the output of map phase only has distinct keys. Please apply the patch 'wrong_semi_join.txt' attached below and use {code} ant test -Dtestcase=TestMinimrCliDriver -Dqfile=left_semi_join.q -Dtest.silent=false {\code} to replay the problem. The wrong result can be found in {code} hive_root_dir/build/ql/test/logs/clientpositive {\code} was: Suppose that we have a query shown below {code:sql} SELECT key FROM t1 LEFT SEMI JOIN t2 ON (t1.key=t2.key); {\code} When the number of rows of t2 is larger than hive.join.emit.interval, JoinOperator will emit rows from t1, which will result in redundant output. Let's say t1 is {code} key 1 {\code} and t2 is {code} key 1 1 1 1 {\code} When hive.join.emit.interval=1, the output of above query will be {code} 1 1 1 1 {\code} The correct result should be {code} 1 {\code} This problem cannot be found in unit test. Because there is a GBY operator inserted before JoinOperator and we have only 1 mapper, the output of map phase only has distinct keys. Please apply the patch 'wrong_semi_join.txt' attached below and use {code} ant test -Dtestcase=TestMinimrCliDriver -Dqfile=left_semi_join.q -Dtest.silent=false {\code} to replay the problem. The wrong result can be found in {code} hive_root_dir/build/ql/test/logs/clientpositive {\code} LEFT SEMI JOIN generates wrong results when the number of rows belonging to a single key of the right table exceed hive.join.emit.interval -- Key: HIVE-4781 URL: https://issues.apache.org/jira/browse/HIVE-4781 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: wrong_semi_join.txt Suppose that we have a query shown below {code:sql} SELECT key FROM t1 LEFT SEMI JOIN t2 ON (t1.key=t2.key); {\code} When the number of rows of t2 is larger than hive.join.emit.interval, JoinOperator will emit rows from t1, which will result in redundant output. Let's say t1 is {code} 1 {\code} and t2 is {code} 1 1 1 1 {\code} When hive.join.emit.interval=1, the output of above query will be {code} 1 1 1 1 {\code} The correct result should be {code} 1 {\code} This problem cannot be found in unit test. Because there is a GBY operator inserted before JoinOperator and we have only 1 mapper, the output of map phase only has distinct keys. Please apply the patch 'wrong_semi_join.txt' attached below and use {code} ant test -Dtestcase=TestMinimrCliDriver -Dqfile=left_semi_join.q -Dtest.silent=false {\code} to replay the problem. The wrong result can be found in {code} hive_root_dir/build/ql/test/logs/clientpositive {\code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira