[jira] [Updated] (HIVE-3633) sort-merge join does not work with sub-queries

2012-11-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3633:
-

Attachment: hive.3633.4.patch

> sort-merge join does not work with sub-queries
> --
>
> Key: HIVE-3633
> URL: https://issues.apache.org/jira/browse/HIVE-3633
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3633.1.patch, hive.3633.2.patch, hive.3633.3.patch, 
> hive.3633.4.patch
>
>
> Consider the following query:
> create table smb_bucket_1(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 6 BUCKETS STORED AS TEXTFILE;
> create table smb_bucket_2(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 6 BUCKETS STORED AS TEXTFILE;
> -- load the above tables
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> explain
> select count(*) from
> (
> select /*+mapjoin(a)*/ a.key as key1, b.key as key2, a.value as value1, 
> b.value as value2
> from smb_bucket_1 a join smb_bucket_2 b on a.key = b.key)
> subq;
> The above query does not use sort-merge join. This would be very useful as we 
> automatically convert the queries to use sorting and bucketing properties for 
> join.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3720) Expand and standardize authorization in Hive

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1350#comment-1350
 ] 

Shreepadma Venugopalan commented on HIVE-3720:
--

Attached document outlines the authorization model and its semantics.

> Expand and standardize authorization in Hive
> 
>
> Key: HIVE-3720
> URL: https://issues.apache.org/jira/browse/HIVE-3720
> Project: Hive
>  Issue Type: Improvement
>  Components: Authorization
>Affects Versions: 0.9.0
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
> Attachments: Hive_Authorization_Functionality.pdf
>
>
> The existing implementation of authorization in Hive is not complete. 
> Additionally the existing implementation has security holes. This JIRA is an 
> umbrella JIRA  for a) extending authorization to all SQL operations and 
> direct metadata operations, and b) standardizing the authorization model and 
> its semantics to mirror that of MySQL as closely as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3720) Expand and standardize authorization in Hive

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3720:
-

Attachment: Hive_Authorization_Functionality.pdf

> Expand and standardize authorization in Hive
> 
>
> Key: HIVE-3720
> URL: https://issues.apache.org/jira/browse/HIVE-3720
> Project: Hive
>  Issue Type: Improvement
>  Components: Authorization
>Affects Versions: 0.9.0
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
> Attachments: Hive_Authorization_Functionality.pdf
>
>
> The existing implementation of authorization in Hive is not complete. 
> Additionally the existing implementation has security holes. This JIRA is an 
> umbrella JIRA  for a) extending authorization to all SQL operations and 
> direct metadata operations, and b) standardizing the authorization model and 
> its semantics to mirror that of MySQL as closely as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3720) Expand and standardize authorization in Hive

2012-11-18 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-3720:


 Summary: Expand and standardize authorization in Hive
 Key: HIVE-3720
 URL: https://issues.apache.org/jira/browse/HIVE-3720
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Affects Versions: 0.9.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan


The existing implementation of authorization in Hive is not complete. 
Additionally the existing implementation has security holes. This JIRA is an 
umbrella JIRA  for a) extending authorization to all SQL operations and direct 
metadata operations, and b) standardizing the authorization model and its 
semantics to mirror that of MySQL as closely as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2691) Specify location of log4j configuration files via configuration properties

2012-11-18 Thread Zhenxiao Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo updated HIVE-2691:
---

Status: Patch Available  (was: Open)

> Specify location of log4j configuration files via configuration properties
> --
>
> Key: HIVE-2691
> URL: https://issues.apache.org/jira/browse/HIVE-2691
> Project: Hive
>  Issue Type: New Feature
>  Components: Configuration, Logging
>Reporter: Carl Steinbach
>Assignee: Zhenxiao Luo
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1131.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.5.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.6.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D2667.1.patch, HIVE-2691.1.patch.txt, 
> HIVE-2691.2.patch.txt, HIVE-2691.D2667.1.patch
>
>
> Oozie needs to be able to override the default location of the log4j 
> configuration
> files from the Hive command line, e.g:
> {noformat}
> hive -hiveconf hive.log4j.file=/home/carl/hive-log4j.properties -hiveconf 
> hive.log4j.exec.file=/home/carl/hive-exec-log4j.properties
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2691) Specify location of log4j configuration files via configuration properties

2012-11-18 Thread Zhenxiao Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo updated HIVE-2691:
---

Attachment: HIVE-2691.2.patch.txt

> Specify location of log4j configuration files via configuration properties
> --
>
> Key: HIVE-2691
> URL: https://issues.apache.org/jira/browse/HIVE-2691
> Project: Hive
>  Issue Type: New Feature
>  Components: Configuration, Logging
>Reporter: Carl Steinbach
>Assignee: Zhenxiao Luo
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1131.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.5.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.6.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D2667.1.patch, HIVE-2691.1.patch.txt, 
> HIVE-2691.2.patch.txt, HIVE-2691.D2667.1.patch
>
>
> Oozie needs to be able to override the default location of the log4j 
> configuration
> files from the Hive command line, e.g:
> {noformat}
> hive -hiveconf hive.log4j.file=/home/carl/hive-log4j.properties -hiveconf 
> hive.log4j.exec.file=/home/carl/hive-exec-log4j.properties
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2691) Specify location of log4j configuration files via configuration properties

2012-11-18 Thread Zhenxiao Luo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1347#comment-1347
 ] 

Zhenxiao Luo commented on HIVE-2691:


Updated patch submitted at:
https://reviews.facebook.net/D6789

> Specify location of log4j configuration files via configuration properties
> --
>
> Key: HIVE-2691
> URL: https://issues.apache.org/jira/browse/HIVE-2691
> Project: Hive
>  Issue Type: New Feature
>  Components: Configuration, Logging
>Reporter: Carl Steinbach
>Assignee: Zhenxiao Luo
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1131.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.5.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D1203.6.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2691.D2667.1.patch, HIVE-2691.1.patch.txt, 
> HIVE-2691.2.patch.txt, HIVE-2691.D2667.1.patch
>
>
> Oozie needs to be able to override the default location of the log4j 
> configuration
> files from the Hive command line, e.g:
> {noformat}
> hive -hiveconf hive.log4j.file=/home/carl/hive-log4j.properties -hiveconf 
> hive.log4j.exec.file=/home/carl/hive-exec-log4j.properties
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3712) Use varbinary instead of longvarbinary to store min and max column values in column stats schema

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499931#comment-13499931
 ] 

Shreepadma Venugopalan commented on HIVE-3712:
--

Review board link: https://reviews.apache.org/r/8119/

> Use varbinary instead of longvarbinary to store min and max column values in 
> column stats schema
> 
>
> Key: HIVE-3712
> URL: https://issues.apache.org/jira/browse/HIVE-3712
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Statistics
>Affects Versions: 0.9.0
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
>
> JDBC type longvarbinary maps to BLOB SQL type in some databases. Storing min 
> and max column values for numeric types takes up 8 bytes and hence doesn't 
> require a BLOB. Storing these values in a BLOB will impact performance 
> without providing much benefits. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499930#comment-13499930
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

Review board link: https://reviews.apache.org/r/8119/

> Add metastore upgrade scripts for column stats schema changes
> -
>
> Key: HIVE-3678
> URL: https://issues.apache.org/jira/browse/HIVE-3678
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
> Fix For: 0.10.0
>
> Attachments: HIVE-3678.1.patch.txt
>
>
> Add upgrade script for column statistics schema changes for 
> Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-3678. Provide schema upgrade scripts for schema changes introduced by the addition of Column Statistics support

2012-11-18 Thread Shreepadma Venugopalan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8119/
---

(Updated Nov. 18, 2012, 10:22 p.m.)


Review request for hive, Carl Steinbach and Ashutosh Chauhan.


Description
---

Provide metastore schema upgrade scripts for MySQL/Derby/Postgresql/Oracle for 
the schema changes introduced by HIVE-1362. Change the column statistics schema 
to allow the metastore schema to be compatible across DB vendors and versions.


This addresses bug HIVE-3678.
https://issues.apache.org/jira/browse/HIVE-3678


Diffs
-

  metastore/scripts/upgrade/derby/012-HIVE-1362.derby.sql PRE-CREATION 
  metastore/scripts/upgrade/derby/hive-schema-0.10.0.derby.sql 1be707e 
  metastore/scripts/upgrade/derby/upgrade-0.9.0-to-0.10.0.derby.sql 714e9d9 
  metastore/scripts/upgrade/mysql/012-HIVE-1362.mysql.sql PRE-CREATION 
  metastore/scripts/upgrade/mysql/hive-schema-0.10.0.mysql.sql 97de3db 
  metastore/scripts/upgrade/mysql/upgrade-0.9.0-to-0.10.0.mysql.sql 1a85081 
  metastore/scripts/upgrade/oracle/012-HIVE-1362.oracle.sql PRE-CREATION 
  metastore/scripts/upgrade/oracle/hive-schema-0.10.0.oracle.sql 029b931 
  metastore/scripts/upgrade/postgres/012-HIVE-1362.postgres.sql PRE-CREATION 
  metastore/scripts/upgrade/postgres/hive-schema-0.10.0.postgres.sql 2f61644 
  metastore/scripts/upgrade/postgres/upgrade-0.9.0-to-0.10.0.postgres.sql 
d3b6571 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java ecc69a2 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
 500ff29 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java
 63bf69b 
  metastore/src/model/package.jdo 5f91f97 

Diff: https://reviews.apache.org/r/8119/diff/


Testing
---

Tested the schema upgrade scripts.


Thanks,

Shreepadma Venugopalan



Review Request: HIVE-3678. Provide schema upgrade scripts for schema changes introduced by the addition of Column Statistics support

2012-11-18 Thread Shreepadma Venugopalan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8119/
---

Review request for hive.


Description
---

Provide metastore schema upgrade scripts for MySQL/Derby/Postgresql/Oracle for 
the schema changes introduced by HIVE-1362. Change the column statistics schema 
to allow the metastore schema to be compatible across DB vendors and versions.


This addresses bug HIVE-3678.
https://issues.apache.org/jira/browse/HIVE-3678


Diffs
-

  metastore/scripts/upgrade/derby/012-HIVE-1362.derby.sql PRE-CREATION 
  metastore/scripts/upgrade/derby/hive-schema-0.10.0.derby.sql 1be707e 
  metastore/scripts/upgrade/derby/upgrade-0.9.0-to-0.10.0.derby.sql 714e9d9 
  metastore/scripts/upgrade/mysql/012-HIVE-1362.mysql.sql PRE-CREATION 
  metastore/scripts/upgrade/mysql/hive-schema-0.10.0.mysql.sql 97de3db 
  metastore/scripts/upgrade/mysql/upgrade-0.9.0-to-0.10.0.mysql.sql 1a85081 
  metastore/scripts/upgrade/oracle/012-HIVE-1362.oracle.sql PRE-CREATION 
  metastore/scripts/upgrade/oracle/hive-schema-0.10.0.oracle.sql 029b931 
  metastore/scripts/upgrade/postgres/012-HIVE-1362.postgres.sql PRE-CREATION 
  metastore/scripts/upgrade/postgres/hive-schema-0.10.0.postgres.sql 2f61644 
  metastore/scripts/upgrade/postgres/upgrade-0.9.0-to-0.10.0.postgres.sql 
d3b6571 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java ecc69a2 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
 500ff29 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java
 63bf69b 
  metastore/src/model/package.jdo 5f91f97 

Diff: https://reviews.apache.org/r/8119/diff/


Testing
---

Tested the schema upgrade scripts.


Thanks,

Shreepadma Venugopalan



[jira] [Updated] (HIVE-3712) Use varbinary instead of longvarbinary to store min and max column values in column stats schema

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3712:
-

Status: Patch Available  (was: Open)

> Use varbinary instead of longvarbinary to store min and max column values in 
> column stats schema
> 
>
> Key: HIVE-3712
> URL: https://issues.apache.org/jira/browse/HIVE-3712
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Statistics
>Affects Versions: 0.9.0
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
>
> JDBC type longvarbinary maps to BLOB SQL type in some databases. Storing min 
> and max column values for numeric types takes up 8 bytes and hence doesn't 
> require a BLOB. Storing these values in a BLOB will impact performance 
> without providing much benefits. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3712) Use varbinary instead of longvarbinary to store min and max column values in column stats schema

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499924#comment-13499924
 ] 

Shreepadma Venugopalan commented on HIVE-3712:
--

It looks like VARBINARY is not supported across different DBs and DB versions 
in a consistent manner. Storing 8 bytes in a LONGVARBINARY is an overkill 
because the LONGVARBINARY is mapped to BLOB type in some DBs. It appears the 
best solution at this point is to store LONG and DOUBLE min and max values in 
two separate columns. 

> Use varbinary instead of longvarbinary to store min and max column values in 
> column stats schema
> 
>
> Key: HIVE-3712
> URL: https://issues.apache.org/jira/browse/HIVE-3712
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Statistics
>Affects Versions: 0.9.0
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
>
> JDBC type longvarbinary maps to BLOB SQL type in some databases. Storing min 
> and max column values for numeric types takes up 8 bytes and hence doesn't 
> require a BLOB. Storing these values in a BLOB will impact performance 
> without providing much benefits. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3678:
-

Attachment: HIVE-3678.1.patch.txt

> Add metastore upgrade scripts for column stats schema changes
> -
>
> Key: HIVE-3678
> URL: https://issues.apache.org/jira/browse/HIVE-3678
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
> Fix For: 0.10.0
>
> Attachments: HIVE-3678.1.patch.txt
>
>
> Add upgrade script for column statistics schema changes for 
> Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3678:
-

Status: Patch Available  (was: Open)

> Add metastore upgrade scripts for column stats schema changes
> -
>
> Key: HIVE-3678
> URL: https://issues.apache.org/jira/browse/HIVE-3678
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
> Fix For: 0.10.0
>
> Attachments: HIVE-3678.1.patch.txt
>
>
> Add upgrade script for column statistics schema changes for 
> Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3264) Add support for binary dataype to AvroSerde

2012-11-18 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499866#comment-13499866
 ] 

Eli Reisman commented on HIVE-3264:
---

This had fallen off my radar too, sorry. What needs to be done/added? When I 
was originally working on this, I was told the .q file approach was the test we 
needed. What sort of test should I add?


> Add support for binary dataype to AvroSerde
> ---
>
> Key: HIVE-3264
> URL: https://issues.apache.org/jira/browse/HIVE-3264
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.9.0
>Reporter: Jakob Homan
>  Labels: patch
> Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, 
> HIVE-3264-4.patch, HIVE-3264-5.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's 
> byte array type is converted an array of small ints.  Now that HIVE-2380 is 
> in, this step isn't necessary and we can convert both Avro's bytes type and 
> probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-18 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499858#comment-13499858
 ] 

Yin Huai commented on HIVE-2206:


[~namit]
Sure. I just took a look at the code. Seems that once I get all content 
summaries of input table, I can make the guess on if join auto resolver will 
work for join operators on input tables. Because, as far as I know, existing 
util functions on retrieving content summaries (called after logical 
optimization) cannot be used directly at here, I need to write some util 
functions to get sizes of input tables. I will start to work on this asap. 
Also, although HIVE-3671 seems not hard to do, but it is not a quick fix. I 
suggest we track this work in a separate jira.

[~cwsteinbach]
Have you got time to look at current patch? Any comment?

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
> HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
> HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
> HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
> HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> This issue proposes a new logical optimizer called Correlation Optimizer, 
> which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
> job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
> paper and slides of YSmart are linked at the bottom.
> Since Hive translates queries in a sentence by sentence fashion, for every 
> operation which may need to shuffle the data (e.g. join and aggregation 
> operations), Hive will generate a MapReduce job for that operation. However, 
> for those operations which may need to shuffle the data, they may involve 
> correlations explained below and thus can be executed in a single MR job.
> # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
> input relation sets are not disjoint;
> # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
> have not only input correlation, but also the same partition key;
> # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
> child nodes if it has the same partition key as that child node.
> The current implementation of correlation optimizer only detect correlations 
> among MR jobs for reduce-side join operators and reduce-side aggregation 
> operators (not map only aggregation). A query will be optimized if it 
> satisfies following conditions.
> # There exists a MR job for reduce-side join operator or reduce side 
> aggregation operator which have JFC with all of its parents MR jobs (TCs will 
> be also exploited if JFC exists);
> # All input tables of those correlated MR job are original input tables (not 
> intermediate tables generated by sub-queries); and 
> # No self join is involved in those correlated MR jobs.
> Correlation optimizer is implemented as a logical optimizer. The main reasons 
> are that it only needs to manipulate the query plan tree and it can leverage 
> the existing component on generating MR jobs.
> Current implementation can serve as a framework for correlation related 
> optimizations. I think that it is better than adding individual optimizers. 
> There are several work that can be done in future to improve this optimizer. 
> Here are three examples.
> # Support queries only involve TC;
> # Support queries in which input tables of correlated MR jobs involves 
> intermediate tables; and 
> # Optimize queries involving self join. 
> References:
> Paper and presentation of YSmart.
> Paper: 
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira