[jira] [Updated] (HIVE-4790) MapredLocalTask task does not make virtual columns

2013-06-26 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4790:
--

Attachment: HIVE-4790.D11511.1.patch

navis requested code review of HIVE-4790 [jira] MapredLocalTask task does not 
make virtual columns.

Reviewers: JIRA

DPAL-4790 MapredLocalTask task does not make virtual columns

From mailing list, 
http://www.mail-archive.com/user@hive.apache.org/msg08264.html

SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON 
b.rownumber = a.number;
fails with this error:

 SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = 
a.number;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
setting HADOOP_USER_NAMEpmarron
13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property 
hive.metastore.local no longer has any effect. Make sure to provide a valid 
value for hive.metastore.uris if you are connecting to a remote metastore.
Execution log at: /tmp/pmarron/.log
2013-06-25 10:52:56 Starting to launch local task to process map join;  
maximum memory = 932118528
java.lang.RuntimeException: cannot find field block__offset__inside__file from 
[0:rownumber, 1:offset]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366)
at 
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
at 
org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
at 
org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68)
at 
org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at 
org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394)
at 
org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277)
at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Execution failed with exit status: 2

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D11511

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsWork.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java
  ql/src/test/queries/clientpositive/join_vc.q
  ql/src/test/results/clientpositive/join_vc.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/27237/

To: JIRA, navis


 MapredLocalTask task does not make virtual columns
 --

 Key: HIVE-4790
 URL: https://issues.apache.org/jira/browse/HIVE-4790
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4790.D11511.1.patch


 From mailing list, 
 http://www.mail-archive.com/user@hive.apache.org/msg08264.html
 {noformat}
 SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON 
 b.rownumber = a.number;
 fails with this error:
  
  SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = 
 a.number;
 Automatically selecting local only mode for query
 Total MapReduce jobs = 1
 setting HADOOP_USER_NAMEpmarron
 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property 
 hive.metastore.local no 

[jira] [Commented] (HIVE-4290) Build profiles: Partial builds for quicker dev

2013-06-26 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693807#comment-13693807
 ] 

Gunther Hagleitner commented on HIVE-4290:
--

Test came back clean for me. I think .2 is ready.

 Build profiles: Partial builds for quicker dev
 --

 Key: HIVE-4290
 URL: https://issues.apache.org/jira/browse/HIVE-4290
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-4290.2.patch, HIVE-4290.D11481.1.patch, 
 HIVE-4290.patch


 Building is definitely taking longer with hcat, hs2 etc in the build. When 
 you're working on one area of the system though, it would be easier to have 
 an option to only build that. Not for pre-commit or build machines, but for 
 dev this should help.
 ant clean package build OR
 ant -Dbuild.profile=full clean package test -- build everything
 ant -Dbuild.profile=core clean package test -- build just enough to run the 
 tests in ql
 ant -Dbuild.profile=hcat clean package test -- build only hcatalog

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 12100: Patch to fix HIVE-4789

2013-06-26 Thread Ben Spivey

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12100/#review22403
---

Ship it!


Ship It!

- Ben Spivey


On June 26, 2013, 5:56 a.m., Sean Busbey wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/12100/
 ---
 
 (Updated June 26, 2013, 5:56 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan, Jakob Homan, and Mark Wagner.
 
 
 Repository: hive
 
 
 Description
 ---
 
 HIVE-3953 fixed using partitioned avro tables for anything that used the 
 MapOperator, but those that rely on FetchOperator still fail with the same 
 error.
 e.g.
   SELECT * FROM partitioned_avro LIMIT 5;
   SELECT * FROM partitioned_avro WHERE partition_col=value;
 
 
 Diffs
 -
 
   
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
 1496728 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 1496728 
   trunk/ql/src/test/queries/clientpositive/avro_partitioned.q 1496728 
   trunk/ql/src/test/results/clientpositive/avro_partitioned.q.out 1496728 
 
 Diff: https://reviews.apache.org/r/12100/diff/
 
 
 Testing
 ---
 
 reran avro partition unit tests and partition_wise_fileformat*.q
 
 
 Thanks,
 
 Sean Busbey
 




[jira] [Updated] (HIVE-2269) Hive --auxpath option can't handle multiple colon separated values

2013-06-26 Thread Josh Spiegel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Spiegel updated HIVE-2269:
---

Affects Version/s: 0.10.0

 Hive --auxpath option can't handle multiple colon separated values
 --

 Key: HIVE-2269
 URL: https://issues.apache.org/jira/browse/HIVE-2269
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.7.0, 0.7.1, 0.10.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-2269-auxpath.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4791) improve test coverage of package org.apache.hadoop.hive.ql.udf.xml

2013-06-26 Thread Ivan A. Veselovsky (JIRA)
Ivan A. Veselovsky created HIVE-4791:


 Summary: improve test coverage of package 
org.apache.hadoop.hive.ql.udf.xml
 Key: HIVE-4791
 URL: https://issues.apache.org/jira/browse/HIVE-4791
 Project: Hive
  Issue Type: Test
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky


improve test coverage of package org.apache.hadoop.hive.ql.udf.xml to 80%.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4791) improve test coverage of package org.apache.hadoop.hive.ql.udf.xml

2013-06-26 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694142#comment-13694142
 ] 

Edward Capriolo commented on HIVE-4791:
---

How are you counting test coverage. The automated tools like cobertura do not 
'understand' our *.q test format. Thus we have more coverage then these tools 
indicate. Maybe we can thing of a clever way to compile and run the q tests so 
we can see the true coverage. 

 improve test coverage of package org.apache.hadoop.hive.ql.udf.xml
 --

 Key: HIVE-4791
 URL: https://issues.apache.org/jira/browse/HIVE-4791
 Project: Hive
  Issue Type: Test
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky

 improve test coverage of package org.apache.hadoop.hive.ql.udf.xml to 80%.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4719) EmbeddedLockManager should be shared to all clients

2013-06-26 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694240#comment-13694240
 ] 

Phabricator commented on HIVE-4719:
---

brock has commented on the revision HIVE-4719 [jira] EmbeddedLockManager 
should be shared to all clients.

  Navis,

  The patch looks good to me. I think the issue where creation of the 
factory/manager can be taken forward in a follow JIRA since it's not related to 
the patch itself!

  Cheers!
  Brock

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java:143 Ahh interesting!  I 
don't think Hive should continue on if it cannot create a lock manager but is 
configured to use concurrency. However, I think we can handle this on a follow 
on JIRA.

REVISION DETAIL
  https://reviews.facebook.net/D11229

To: JIRA, navis
Cc: brock


 EmbeddedLockManager should be shared to all clients
 ---

 Key: HIVE-4719
 URL: https://issues.apache.org/jira/browse/HIVE-4719
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-4719.D11229.1.patch, HIVE-4719.D11229.2.patch


 Currently, EmbeddedLockManager is created per Driver instance, so locking has 
 no meaning.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4724) ORC readers should have a better error detection for non-ORC files

2013-06-26 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4724:
--

Attachment: HIVE-4724.D11529.1.patch

omalley requested code review of HIVE-4724 [jira] ORC readers should have a 
better error detection for non-ORC files.

Reviewers: JIRA

Add better checks for non-ORC files in the ORC reader to fail with a better 
error message. Also add a version
check to warn users if they are reading files from a more advanaced version of 
Hadoop. Added a check so that unknown
encodings for a column will fail quickly with a good error message.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D11529

AFFECTED FILES
  .gitignore
  .idea/.name
  .idea/ant.xml
  .idea/codeStyleSettings.xml
  .idea/compiler.xml
  .idea/copyright/Apache.xml
  .idea/copyright/profiles_settings.xml
  .idea/encodings.xml
  .idea/libraries/default.xml
  .idea/libraries/hadoop0_20S_shim.xml
  .idea/libraries/hadoop0_20_shim.xml
  .idea/libraries/hadoop0_23.xml
  .idea/misc.xml
  .idea/modules.xml
  .idea/scopes/scope_settings.xml
  .idea/uiDesigner.xml
  .idea/vcs.xml
  .idea/workspace.xml
  ant/Ant.iml
  builtins/Builtins.iml
  cli/src/Cli.iml
  common/src/Common.iml
  contrib/src/Contrib.iml
  hbase-handler/src/Hbase-handler.iml
  hwi/src/Hwi.iml
  jdbc/src/Jdbc.iml
  metastore/src/Metastore.iml
  metastore/src/gen/thrift/Thrift.iml
  metastore/src/test/Metastore-test.iml
  pdk/src/Pdk.iml
  pdk/test-plugin/Test-plugin.iml
  ql/src/Ql.iml
  ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
  ql/src/gen/thrift/Thrift1.iml
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
  ql/src/test/Ql-test.iml
  serde/src/Serde.iml
  serde/src/gen/protobuf/Protobuf.iml
  serde/src/gen/thrift/Thrift2.iml
  serde/src/test/Serde-test.iml
  service/src/Service.iml
  service/src/gen/thrift/Thrift3.iml
  shims/src/0.20/Shims-0.20.iml
  shims/src/0.20S/Shims-0.20S.iml
  shims/src/0.23/Shims-0.23.iml
  shims/src/Shims.iml
  shims/src/common-secure/Shims-secure.iml
  shims/src/test/Shims-test.iml

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/27255/

To: JIRA, omalley


 ORC readers should have a better error detection for non-ORC files
 --

 Key: HIVE-4724
 URL: https://issues.apache.org/jira/browse/HIVE-4724
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4724.D11529.1.patch


 A customer loaded a text file into a table that is stored as ORC. The error 
 message was very unfriendly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 11326: HIVE-4588: Support session level hooks for HiveServer2

2013-06-26 Thread Prasad Mujumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11326/
---

(Updated June 27, 2013, 12:39 a.m.)


Review request for hive.


Changes
---

Additional comments for the new classes/interfaces


Bugs: HIVE-4588
https://issues.apache.org/jira/browse/HIVE-4588


Repository: hive-git


Description
---

Support session level hooks for HiveServer2
  - New config parameter to define the hook
  - New hook context interface to pass the serssion user and config to the hook 
implementation
  - Session manager executes the configured hooks when a new session starts


Diffs (updated)
-

  beeline/src/java/org/apache/hive/beeline/Commands.java 3799cc1 
  beeline/src/test/org/apache/hive/beeline/src/test/TestBeeLineWithArgs.java 
030f6b0 
  build-common.xml d642b51 
  cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java d9b7031 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cc775d9 
  conf/hive-default.xml.template 5de5965 
  data/conf/hive-site.xml 4e6ff16 
  data/files/person c902284 
  hbase-handler/src/test/templates/TestHBaseCliDriver.vm c59e882 
  hbase-handler/src/test/templates/TestHBaseNegativeCliDriver.vm aaab85b 
  hcatalog/bin/hcat 455f108 
  hcatalog/core/src/test/java/org/apache/hcatalog/cli/TestSemanticAnalysis.java 
d7a2b68 
  hcatalog/src/docs/src/documentation/content/xdocs/readerwriter.xml e36090e 
  hcatalog/src/test/e2e/hcatalog/build.xml 8cf7407 
  hcatalog/src/test/e2e/hcatalog/drivers/TestDriverHiveCmdLine.pm 6154475 
  hcatalog/src/test/e2e/hcatalog/resource/default.res 01bfaee 
  hcatalog/src/test/e2e/hcatalog/resource/windows.res 01bfaee 
  hcatalog/src/test/e2e/hcatalog/tests/hcat.conf fa7893b 
  hcatalog/src/test/e2e/hcatalog/tests/hive_cmdline.conf 91c0786 
  hcatalog/src/test/e2e/hcatalog/tests/hive_nightly.conf d026872 
  hcatalog/src/test/e2e/hcatalog/tools/test/floatpostprocessor.pl ec5de96 
  hcatalog/src/test/e2e/templeton/README.txt dac6ffc 
  hcatalog/src/test/e2e/templeton/build.xml 4bce25b 
  hcatalog/src/test/e2e/templeton/resource/default.res 01bfaee 
  hcatalog/src/test/e2e/templeton/resource/windows.res 01bfaee 
  jdbc/src/java/org/apache/hadoop/hive/jdbc/HivePreparedStatement.java 2859859 
  jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java 4c1ab3b 
  jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 0e90fec 
  jdbc/src/java/org/apache/hive/jdbc/HiveDatabaseMetaData.java 4cb1422 
  jdbc/src/java/org/apache/hive/jdbc/HiveDriver.java 2576914 
  jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java a7c432d 
  jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java b142e8c 
  jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java b108c7a 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
88151a1 
  ql/build.xml a34a079 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 5340e99 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java a5a867a 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java c796770 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 6935738 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java 854cd52 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CopyTask.java 38d97e3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 295daab 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DependencyCollectionTask.java 
9189cfc 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 11772e6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 5a00c2d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeEvaluator.java 5cd9bde 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java 
b4da80c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 6e9e0a8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java b4b2c90 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java 988b389 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java 6bbcb26 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java ac8e167 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 90d93f6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java 092be6e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java c737d7a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 599f63c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 17387a9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TaskRunner.java fcf9adc 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 68ec54a 
  ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeTask.java 
364fc19 
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 
cbee423 
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java 
a1abf90 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateTask.java
 a9cd8ac 
  

[jira] [Updated] (HIVE-4588) Support session level hooks for HiveServer2

2013-06-26 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4588:
--

Attachment: HIVE-4588-2.patch

Updated the patch with addional comments for new classes/interfaces

 Support session level hooks for HiveServer2
 ---

 Key: HIVE-4588
 URL: https://issues.apache.org/jira/browse/HIVE-4588
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-4588-1.patch, HIVE-4588-2.patch


 Support session level hooks for HiveSrver2. The configured hooks will get 
 executed at beginning of each new session.
 This is useful for auditing connections, possibly tuning the session level 
 properties etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-06-26 Thread Irwin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694416#comment-13694416
 ] 

Irwin commented on HIVE-3552:
-

I have tested for cubes and rollups, but failed.
My table is:t1,formatted followes:

The error message is:

I have tried to use hive-0.10.0 and hive-0.11.0, and the error is same.
Why I cannot use Enhanced Aggregation, Cube, Grouping and Rollup?
Any one help? thanks!

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.11.0

 Attachments: hive.3552.10.patch, hive.3552.11.patch, 
 hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
 hive.3552.8.patch, hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-26 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.11.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

  I tested all unit tests before the commit of HIVE-4496. all unit tests pass

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35055id=35181#toc

AFFECTED FILES
  build-common.xml
  data/files/leftsemijoin_mr_t1.txt
  data/files/leftsemijoin_mr_t2.txt
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
  ql/src/test/queries/clientpositive/leftsemijoin_mr.q
  ql/src/test/results/clientpositive/leftsemijoin_mr.q.out

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.1.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, 
 HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, 
 HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, 
 testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more 

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-06-26 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.12.patch

yhuai updated the revision HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization.

  My last diff was for 4718...

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35181id=35193#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer10.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/queries/clientpositive/correlationoptimizer7.q
  ql/src/test/queries/clientpositive/correlationoptimizer8.q
  ql/src/test/queries/clientpositive/correlationoptimizer9.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer10.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out
  ql/src/test/results/clientpositive/correlationoptimizer8.q.out
  ql/src/test/results/clientpositive/correlationoptimizer9.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
 HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 

Does Hive 0.11 have Query Flattening optimizations?

2013-06-26 Thread Mihir Kulkarni
Hello,

Does hive support Query Flattening? For example a query like this:

*SELECT alias.a0, alias.a1*
*FROM*
*(SELECT COUNT(b) AS a0, c AS a1*
*FROM test*
*GROUP BY c) alias*
*WHERE alias.a0  2;*
*
*
would be flattened into:

*SELECT COUNT(b), c*
*FROM test*
*GROUP BY c*
*HAVING COUNT(b)  2;*

Does Hive (0.11) have such kind of optimizations or are they even useful
considering all queries are ultimately converted into MapReduce jobs? At
Informatica Corp we rely on Hive a lot and hence are interested to support
such optimizations.

Thanks in anticipation.

Regards,
*---*
*Mihir Kulkarni
**Software Engineer | Data Engine
Informatica Corporation*


[jira] [Updated] (HIVE-4781) LEFT SEMI JOIN generates wrong results when the number of rows belonging to a single key of the right table exceed hive.join.emit.interval

2013-06-26 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4781:
---

Description: 
Suppose that we have a query shown below
{code:sql}
SELECT key FROM t1 LEFT SEMI JOIN t2 ON (t1.key=t2.key);
{\code}

When the number of rows of t2 is larger than hive.join.emit.interval, 
JoinOperator will emit rows from t1, which will result in redundant output.

Let's say t1 is
{code}
1
{\code}
and t2 is
{code}
1
1
1
1
{\code}

When hive.join.emit.interval=1, the output of above query will be
{code}
1
1
1
1
{\code}
The correct result should be 
{code}
1
{\code}

This problem cannot be found in unit test. Because there is a GBY operator 
inserted before JoinOperator and we have only 1 mapper, the output of map phase 
only has distinct keys.

Please apply the patch 'wrong_semi_join.txt' attached below and use 
{code}
ant test -Dtestcase=TestMinimrCliDriver -Dqfile=left_semi_join.q 
-Dtest.silent=false
{\code} to replay the problem. The wrong result can be found in 
{code}
hive_root_dir/build/ql/test/logs/clientpositive
{\code}

  was:
Suppose that we have a query shown below
{code:sql}
SELECT key FROM t1 LEFT SEMI JOIN t2 ON (t1.key=t2.key);
{\code}

When the number of rows of t2 is larger than hive.join.emit.interval, 
JoinOperator will emit rows from t1, which will result in redundant output.

Let's say t1 is
{code}
key

1
{\code}
and t2 is
{code}
key

1
1
1
1
{\code}

When hive.join.emit.interval=1, the output of above query will be
{code}
1
1
1
1
{\code}
The correct result should be 
{code}
1
{\code}

This problem cannot be found in unit test. Because there is a GBY operator 
inserted before JoinOperator and we have only 1 mapper, the output of map phase 
only has distinct keys.

Please apply the patch 'wrong_semi_join.txt' attached below and use 
{code}
ant test -Dtestcase=TestMinimrCliDriver -Dqfile=left_semi_join.q 
-Dtest.silent=false
{\code} to replay the problem. The wrong result can be found in 
{code}
hive_root_dir/build/ql/test/logs/clientpositive
{\code}


 LEFT SEMI JOIN generates wrong results when the number of rows belonging to a 
 single key of the right table exceed hive.join.emit.interval
 --

 Key: HIVE-4781
 URL: https://issues.apache.org/jira/browse/HIVE-4781
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: wrong_semi_join.txt


 Suppose that we have a query shown below
 {code:sql}
 SELECT key FROM t1 LEFT SEMI JOIN t2 ON (t1.key=t2.key);
 {\code}
 When the number of rows of t2 is larger than hive.join.emit.interval, 
 JoinOperator will emit rows from t1, which will result in redundant output.
 Let's say t1 is
 {code}
 1
 {\code}
 and t2 is
 {code}
 1
 1
 1
 1
 {\code}
 When hive.join.emit.interval=1, the output of above query will be
 {code}
 1
 1
 1
 1
 {\code}
 The correct result should be 
 {code}
 1
 {\code}
 This problem cannot be found in unit test. Because there is a GBY operator 
 inserted before JoinOperator and we have only 1 mapper, the output of map 
 phase only has distinct keys.
 Please apply the patch 'wrong_semi_join.txt' attached below and use 
 {code}
 ant test -Dtestcase=TestMinimrCliDriver -Dqfile=left_semi_join.q 
 -Dtest.silent=false
 {\code} to replay the problem. The wrong result can be found in 
 {code}
 hive_root_dir/build/ql/test/logs/clientpositive
 {\code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira