date:20130220

[jira] [Updated] (HIVE-4042) ignore mapjoin hint

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4042:
-

Attachment: hive.4042.2.patch

> ignore mapjoin hint
> ---
>
> Key: HIVE-4042
> URL: https://issues.apache.org/jira/browse/HIVE-4042
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.4042.1.patch, hive.4042.2.patch
>
>
> After HIVE-3784, in a production environment, it can become difficult to
> deploy since a lot of production queries can break.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3938) Hive MetaStore should send a single AddPartitionEvent for atomically added partition-set.

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3938:
-

Status: Open  (was: Patch Available)

Can you refresh once HIVE-4004 is in ?

> Hive MetaStore should send a single AddPartitionEvent for atomically added 
> partition-set.
> -
>
> Key: HIVE-3938
> URL: https://issues.apache.org/jira/browse/HIVE-3938
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.10.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-3938.patch
>
>
> HiveMetaStore::add_partitions() currently adds all partitions specified in 
> one call using a single meta-store transaction. This acts correctly. However, 
> there's one AddPartitionEvent created per partition specified.
> Ideally, the set of partitions added atomically can be communicated using a 
> single AddPartitionEvent, such that they are consumed together.
> I'll post a patch that does this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails

2013-02-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582054#comment-13582054
 ] 

Namit Jain commented on HIVE-4004:
--

+1


> Incorrect status for AddPartition metastore event if RawStore commit fails
> --
>
> Key: HIVE-4004
> URL: https://issues.apache.org/jira/browse/HIVE-4004
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.10.0
>Reporter: Dilip Joseph
>Assignee: Dilip Joseph
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: HIVE-4004.1.patch.txt
>
>
> For ADD PARTITION operations, the AddPartitionEvent does not care if the 
> RawStore commit succeeded or not.  This means that an AddPartitionEvent with 
> status=true is fired even if the the actual ADD PARTITION operation failed.  
> This will confuse any AddPartitionEvent listeners.
> Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the 
> status of the RawStore commit.  Only AddPartitionEvent has this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2843) UDAF to convert an aggregation to a map

2013-02-20 Thread David Worms (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582066#comment-13582066
 ] 

David Worms commented on HIVE-2843:
---

I just created the requested phabricator entry: 
https://reviews.facebook.net/T45. 

I did my best, arc wasnt working for me, a message like "libphutil v1 libraries 
are no longer supported", I tried a workaround illustrated on the mailing list 
(http://mail-archives.apache.org/mod_mbox/hive-dev/201301.mbox/%3CFF1DF58D04F11D4291D09795D1A4EF1618657D12DB@SRV-MAIL%3E)
 but also without success. I ended up creating the patch and uploading it 
manually.

> UDAF to convert an aggregation to a map
> ---
>
> Key: HIVE-2843
> URL: https://issues.apache.org/jira/browse/HIVE-2843
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.9.0, 0.10.0
>Reporter: David Worms
>Priority: Minor
>  Labels: features, udf
> Attachments: HIVE-2843.1.patch.txt
>
>
> I propose the addition of two new Hive UDAF to help with maps in Apache Hive. 
> The source code is available on GitHub at https://github.com/wdavidw/hive-udf 
> in two Java classes: "UDAFToMap" and "UDAFToOrderedMap". The first function 
> convert an aggregation into a map and is internally using a Java `HashMap`. 
> The second function extends the first one. It convert an aggregation into an 
> ordered map and is internally using a Java `TreeMap`. They both extends the 
> `AbstractGenericUDAFResolver` class.
> Also, I have covered the motivations and usages of those UDAF in a blog post 
> at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/
> The full patch is available with tests as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3970:
-

Status: Open  (was: Patch Available)

Can you refresh ?
This patch is not applying cleanly anymore.

> Clean up/fix PartitionNameWhitelistPreEventListener
> ---
>
> Key: HIVE-3970
> URL: https://issues.apache.org/jira/browse/HIVE-3970
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt
>
>
> There are a number of issues and things which can be cleaned up related to 
> PartitionNameWhitelistPreEventListener.
> * It's an event listener, but it really doesn't need to be given that the 
> regex whitelist is configurable, it could just be a utility method.
> * It's not run when a partition is renamed, so partitions with invalid 
> characters can be created in this way.
> * There's no easy way to check if a partition contains invalid characters 
> before creating it and seeing if it fails.
> Most importantly, when a dynamic partition contains an invalid character, the 
> directory for this partition is created, and the data is moved into it, but 
> the partition fails to be created leaving an orphan directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3672) Support altering partition column type in Hive

2013-02-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582078#comment-13582078
 ] 

Namit Jain commented on HIVE-3672:
--

The patch is still not applying cleanly for me.

> Support altering partition column type in Hive
> --
>
> Key: HIVE-3672
> URL: https://issues.apache.org/jira/browse/HIVE-3672
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, SQL
>Reporter: Jingwei Lu
>Assignee: Jingwei Lu
>  Labels: features
> Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, 
> HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, 
> HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.7.patch.txt
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Currently, Hive does not allow altering partition column types.  As we've 
> discouraged users from using non-string partition column types, this presents 
> a problem for users who want to change there partition columns to be strings, 
> they have to rename their table, create a new table, and copy all the data 
> over.
> To support this via the CLI, adding a command like ALTER TABLE  
> PARTITION COLUMN ( );

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3672) Support altering partition column type in Hive

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3672:
-

Status: Open  (was: Patch Available)

> Support altering partition column type in Hive
> --
>
> Key: HIVE-3672
> URL: https://issues.apache.org/jira/browse/HIVE-3672
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, SQL
>Reporter: Jingwei Lu
>Assignee: Jingwei Lu
>  Labels: features
> Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, 
> HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, 
> HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.7.patch.txt
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Currently, Hive does not allow altering partition column types.  As we've 
> discouraged users from using non-string partition column types, this presents 
> a problem for users who want to change there partition columns to be strings, 
> they have to rename their table, create a new table, and copy all the data 
> over.
> To support this via the CLI, adding a command like ALTER TABLE  
> PARTITION COLUMN ( );

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.

2013-02-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582083#comment-13582083
 ] 

Namit Jain commented on HIVE-4039:
--

+1

> Hive compiler sometimes fails in semantic analysis / optimisation stage when 
> boolean variable appears in WHERE clause.
> --
>
> Key: HIVE-4039
> URL: https://issues.apache.org/jira/browse/HIVE-4039
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jean Xu
>Assignee: Jean Xu
>Priority: Minor
> Attachments: HIVE_4039.1.patch.txt
>
>
> Hive compiler fails with a NullPointerException in semantic analysis / 
> optimisation stage when a boolean variable appears in the WHERE clause in 
> some cases. A minimal query to generate this error is here:
> SELECT 1
> FROM (
> SELECT TRUE AS flag
> FROM dim_one_row:measurementsystems
> ) a
> WHERE flag;
> On the other hand, the following query is perfectly fine:
> SELECT 1
> FROM (
> SELECT TRUE AS flag
> FROM dim_one_row:measurementsystems
> ) a
> WHERE flag=TRUE;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-02-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582121#comment-13582121
 ] 

Namit Jain commented on HIVE-3874:
--

Can you fix eclipse also ?

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, 
> HIVE-3874.D8529.2.patch, OrcFileIntro.pptx, orc.tgz
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4004:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed. Thanks Dilip

> Incorrect status for AddPartition metastore event if RawStore commit fails
> --
>
> Key: HIVE-4004
> URL: https://issues.apache.org/jira/browse/HIVE-4004
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.10.0
>Reporter: Dilip Joseph
>Assignee: Dilip Joseph
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: HIVE-4004.1.patch.txt
>
>
> For ADD PARTITION operations, the AddPartitionEvent does not care if the 
> RawStore commit succeeded or not.  This means that an AddPartitionEvent with 
> status=true is fired even if the the actual ADD PARTITION operation failed.  
> This will confuse any AddPartitionEvent listeners.
> Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the 
> status of the RawStore commit.  Only AddPartitionEvent has this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4042) ignore mapjoin hint

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4042:
-

Status: Patch Available  (was: Open)

Tests passed

> ignore mapjoin hint
> ---
>
> Key: HIVE-4042
> URL: https://issues.apache.org/jira/browse/HIVE-4042
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.4042.1.patch, hive.4042.2.patch
>
>
> After HIVE-3784, in a production environment, it can become difficult to
> deploy since a lot of production queries can break.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4042) ignore mapjoin hint

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4042:
-

Status: Patch Available  (was: Open)

Tests passed

> ignore mapjoin hint
> ---
>
> Key: HIVE-4042
> URL: https://issues.apache.org/jira/browse/HIVE-4042
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.4042.1.patch, hive.4042.2.patch
>
>
> After HIVE-3784, in a production environment, it can become difficult to
> deploy since a lot of production queries can break.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Hive-trunk-h0.21 - Build # 1977 - Still Failing

2013-02-20 Thread Apache Jenkins Server

Changes for Build #1975
[namit] HIVE-4021 PostgreSQL upgrade scripts are creating column with incorrect 
name
(Jarek Jarcec Cecho via namit)

[hashutosh] HIVE-4033 : NPE at runtime while selecting virtual column after 
joining three tables on different keys (Ashutosh Chauhan)

[namit] HIVE-4029 Hive Profiler dies with NPE
(Brock Noland via namit)


Changes for Build #1976
[namit] HIVE-4023 Improve Error Logging in MetaStore
(Bhushan Mandhani via namit)

[namit] HIVE-3403 user should not specify mapjoin to perform sort-merge 
bucketed join
(Namit Jain via Ashutosh)

[namit] HIVE-4024 Derby metastore update script will fail when upgrading from 
0.9.0
to 0.10.0 (Jarek Jarcec Cecho via namit)


Changes for Build #1977



1 tests failed.
FAILED:  
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_aggregator_error_1

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.
at 
net.sf.antcontrib.logic.ForTask.doSequentialIteration(ForTask.java:259)
at net.sf.antcontrib.logic.ForTask.doToken(ForTask.java:268)
at net.sf.antcontrib.logic.ForTask.doTheTasks(ForTask.java:299)
at net.sf.antcontrib.logic.ForTask.execute(ForTask.java:244)




The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1977)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1977/ to 
view the results.

[jira] [Updated] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4039:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed. Thanks Jean

> Hive compiler sometimes fails in semantic analysis / optimisation stage when 
> boolean variable appears in WHERE clause.
> --
>
> Key: HIVE-4039
> URL: https://issues.apache.org/jira/browse/HIVE-4039
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jean Xu
>Assignee: Jean Xu
>Priority: Minor
> Attachments: HIVE_4039.1.patch.txt
>
>
> Hive compiler fails with a NullPointerException in semantic analysis / 
> optimisation stage when a boolean variable appears in the WHERE clause in 
> some cases. A minimal query to generate this error is here:
> SELECT 1
> FROM (
> SELECT TRUE AS flag
> FROM dim_one_row:measurementsystems
> ) a
> WHERE flag;
> On the other hand, the following query is perfectly fine:
> SELECT 1
> FROM (
> SELECT TRUE AS flag
> FROM dim_one_row:measurementsystems
> ) a
> WHERE flag=TRUE;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4027) Thrift alter_table api doesnt validate column type

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4027:
-

   Resolution: Fixed
Fix Version/s: 0.11.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Tim

> Thrift alter_table api doesnt validate column type
> --
>
> Key: HIVE-4027
> URL: https://issues.apache.org/jira/browse/HIVE-4027
> Project: Hive
>  Issue Type: Bug
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
> Fix For: 0.11.0
>
> Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3
>
>
> Thrift alter_table api doesnt validate column type so that invalid column 
> type can sneak it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4027) Thrift alter_table api doesnt validate column type

2013-02-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582185#comment-13582185
 ] 

Hudson commented on HIVE-4027:
--

Integrated in hive-trunk-hadoop1 #93 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/93/])
HIVE-4027 Thrift alter_table api doesnt validate column type
(Gang Tim Liu via namit) (Revision 1448138)

 Result = ABORTED
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448138
Files : 
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java


> Thrift alter_table api doesnt validate column type
> --
>
> Key: HIVE-4027
> URL: https://issues.apache.org/jira/browse/HIVE-4027
> Project: Hive
>  Issue Type: Bug
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
> Fix For: 0.11.0
>
> Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3
>
>
> Thrift alter_table api doesnt validate column type so that invalid column 
> type can sneak it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.

2013-02-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582184#comment-13582184
 ] 

Hudson commented on HIVE-4039:
--

Integrated in hive-trunk-hadoop1 #93 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/93/])
HIVE-4039 Hive compiler sometimes fails in semantic analysis / optimisation 
stage when boolean
variable appears in WHERE clause. (Jezn Xu via namit) (Revision 1448135)

 Result = ABORTED
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448135
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/test_boolean_whereclause.q
* /hive/trunk/ql/src/test/results/clientpositive/test_boolean_whereclause.q.out


> Hive compiler sometimes fails in semantic analysis / optimisation stage when 
> boolean variable appears in WHERE clause.
> --
>
> Key: HIVE-4039
> URL: https://issues.apache.org/jira/browse/HIVE-4039
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jean Xu
>Assignee: Jean Xu
>Priority: Minor
> Attachments: HIVE_4039.1.patch.txt
>
>
> Hive compiler fails with a NullPointerException in semantic analysis / 
> optimisation stage when a boolean variable appears in the WHERE clause in 
> some cases. A minimal query to generate this error is here:
> SELECT 1
> FROM (
> SELECT TRUE AS flag
> FROM dim_one_row:measurementsystems
> ) a
> WHERE flag;
> On the other hand, the following query is perfectly fine:
> SELECT 1
> FROM (
> SELECT TRUE AS flag
> FROM dim_one_row:measurementsystems
> ) a
> WHERE flag=TRUE;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails

2013-02-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582186#comment-13582186
 ] 

Hudson commented on HIVE-4004:
--

Integrated in hive-trunk-hadoop1 #93 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/93/])
HIVE-4004 Incorrect status for AddPartition metastore event if RawStore 
commit fails
(Dilip Joseph via namit) (Revision 1448101)

 Result = ABORTED
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448101
Files : 
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListenerOnlyOnCommit.java


> Incorrect status for AddPartition metastore event if RawStore commit fails
> --
>
> Key: HIVE-4004
> URL: https://issues.apache.org/jira/browse/HIVE-4004
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.10.0
>Reporter: Dilip Joseph
>Assignee: Dilip Joseph
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: HIVE-4004.1.patch.txt
>
>
> For ADD PARTITION operations, the AddPartitionEvent does not care if the 
> RawStore commit succeeded or not.  This means that an AddPartitionEvent with 
> status=true is fired even if the the actual ADD PARTITION operation failed.  
> This will confuse any AddPartitionEvent listeners.
> Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the 
> status of the RawStore commit.  Only AddPartitionEvent has this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4007) Create abstract classes for serializer and deserializer

2013-02-20 Thread Jarek Jarcec Cecho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582260#comment-13582260
 ] 

Jarek Jarcec Cecho commented on HIVE-4007:
--

+1 (non-binding)

Thank you for working on this Namit!

Jarcec

> Create abstract classes for serializer and deserializer
> ---
>
> Key: HIVE-4007
> URL: https://issues.apache.org/jira/browse/HIVE-4007
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.4007.1.patch, hive.4007.2.patch, hive.4007.3.patch
>
>
> Currently, it is very difficult to change the Serializer/Deserializer
> interface, since all the SerDes directly implement the interface.
> Instead, we should have abstract classes for implementing these interfaces.
> In case of a interface change, only the abstract class and the relevant 
> serde needs to change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3980) Cleanup after HIVE-3403

2013-02-20 Thread Jarek Jarcec Cecho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582267#comment-13582267
 ] 

Jarek Jarcec Cecho commented on HIVE-3980:
--

+1 (non-binding)

Seems as a reasonable changes to me.

Jacec

> Cleanup after HIVE-3403
> ---
>
> Key: HIVE-3980
> URL: https://issues.apache.org/jira/browse/HIVE-3980
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3980.1.patch, hive.3980.2.patch
>
>
> There have been a lot of comments on HIVE-3403, which involve changing 
> variable names/function names/adding more comments/general cleanup etc.
> Since HIVE-3403 involves a lot of refactoring, it was fairly difficult to
> address the comments there, since refreshing becomes impossible. This jira
> is to track those cleanups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4027) Thrift alter_table api doesnt validate column type

2013-02-20 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582270#comment-13582270
 ] 

Gang Tim Liu commented on HIVE-4027:


Namit, thank you very much.

Sent from my iPhone 




> Thrift alter_table api doesnt validate column type
> --
>
> Key: HIVE-4027
> URL: https://issues.apache.org/jira/browse/HIVE-4027
> Project: Hive
>  Issue Type: Bug
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
> Fix For: 0.11.0
>
> Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3
>
>
> Thrift alter_table api doesnt validate column type so that invalid column 
> type can sneak it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3672) Support altering partition column type in Hive

2013-02-20 Thread Jingwei Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582349#comment-13582349
 ] 

Jingwei Lu commented on HIVE-3672:
--

Is there a merge conflict or unit test failure? Could you give me name of which 
test fails if it is the case? I run all my newly added test yesterday and they 
are clean. 

> Support altering partition column type in Hive
> --
>
> Key: HIVE-3672
> URL: https://issues.apache.org/jira/browse/HIVE-3672
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, SQL
>Reporter: Jingwei Lu
>Assignee: Jingwei Lu
>  Labels: features
> Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, 
> HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, 
> HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.7.patch.txt
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Currently, Hive does not allow altering partition column types.  As we've 
> discouraged users from using non-string partition column types, this presents 
> a problem for users who want to change there partition columns to be strings, 
> they have to rename their table, create a new table, and copy all the data 
> over.
> To support this via the CLI, adding a command like ALTER TABLE  
> PARTITION COLUMN ( );

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3968) Enhance logging in TableAccessInfo

2013-02-20 Thread Kevin Wilfong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3968:


Attachment: HIVE-3968.3.patch.txt

> Enhance logging in TableAccessInfo
> --
>
> Key: HIVE-3968
> URL: https://issues.apache.org/jira/browse/HIVE-3968
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, 
> HIVE-3968.3.patch.txt
>
>
> Based on what is currently available in the TableAccessInfo we can infer when 
> it would be a good idea to add bucketing/sorting metadata for tables.  
> However, we can't easily tell if we're already getting the benefits of 
> bucketing/sorting.
> This information can be improved by
> a) storing the input table/partition objects so that we can tell if the 
> tables/partitions are already bucketed/sorted
> b) running the TableAccessAnalyzer after the logical optimizer, so that we 
> can tell from the operators whether or not we are already getting benefits 
> (bucketed/sort merge map joins or map group bys)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3968) Enhance logging in TableAccessInfo

2013-02-20 Thread Kevin Wilfong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3968:


Status: Patch Available  (was: Open)

> Enhance logging in TableAccessInfo
> --
>
> Key: HIVE-3968
> URL: https://issues.apache.org/jira/browse/HIVE-3968
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, 
> HIVE-3968.3.patch.txt
>
>
> Based on what is currently available in the TableAccessInfo we can infer when 
> it would be a good idea to add bucketing/sorting metadata for tables.  
> However, we can't easily tell if we're already getting the benefits of 
> bucketing/sorting.
> This information can be improved by
> a) storing the input table/partition objects so that we can tell if the 
> tables/partitions are already bucketed/sorted
> b) running the TableAccessAnalyzer after the logical optimizer, so that we 
> can tell from the operators whether or not we are already getting benefits 
> (bucketed/sort merge map joins or map group bys)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3968) Enhance logging in TableAccessInfo

2013-02-20 Thread Kevin Wilfong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582372#comment-13582372
 ] 

Kevin Wilfong commented on HIVE-3968:
-

Refreshed.

> Enhance logging in TableAccessInfo
> --
>
> Key: HIVE-3968
> URL: https://issues.apache.org/jira/browse/HIVE-3968
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, 
> HIVE-3968.3.patch.txt
>
>
> Based on what is currently available in the TableAccessInfo we can infer when 
> it would be a good idea to add bucketing/sorting metadata for tables.  
> However, we can't easily tell if we're already getting the benefits of 
> bucketing/sorting.
> This information can be improved by
> a) storing the input table/partition objects so that we can tell if the 
> tables/partitions are already bucketed/sorted
> b) running the TableAccessAnalyzer after the logical optimizer, so that we 
> can tell from the operators whether or not we are already getting benefits 
> (bucketed/sort merge map joins or map group bys)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener

2013-02-20 Thread Kevin Wilfong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3970:


Attachment: HIVE-3970.3.patch.txt

> Clean up/fix PartitionNameWhitelistPreEventListener
> ---
>
> Key: HIVE-3970
> URL: https://issues.apache.org/jira/browse/HIVE-3970
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt, 
> HIVE-3970.3.patch.txt
>
>
> There are a number of issues and things which can be cleaned up related to 
> PartitionNameWhitelistPreEventListener.
> * It's an event listener, but it really doesn't need to be given that the 
> regex whitelist is configurable, it could just be a utility method.
> * It's not run when a partition is renamed, so partitions with invalid 
> characters can be created in this way.
> * There's no easy way to check if a partition contains invalid characters 
> before creating it and seeing if it fails.
> Most importantly, when a dynamic partition contains an invalid character, the 
> directory for this partition is created, and the data is moved into it, but 
> the partition fails to be created leaving an orphan directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener

2013-02-20 Thread Kevin Wilfong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3970:


Status: Patch Available  (was: Open)

> Clean up/fix PartitionNameWhitelistPreEventListener
> ---
>
> Key: HIVE-3970
> URL: https://issues.apache.org/jira/browse/HIVE-3970
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt, 
> HIVE-3970.3.patch.txt
>
>
> There are a number of issues and things which can be cleaned up related to 
> PartitionNameWhitelistPreEventListener.
> * It's an event listener, but it really doesn't need to be given that the 
> regex whitelist is configurable, it could just be a utility method.
> * It's not run when a partition is renamed, so partitions with invalid 
> characters can be created in this way.
> * There's no easy way to check if a partition contains invalid characters 
> before creating it and seeing if it fails.
> Most importantly, when a dynamic partition contains an invalid character, the 
> directory for this partition is created, and the data is moved into it, but 
> the partition fails to be created leaving an orphan directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener

2013-02-20 Thread Kevin Wilfong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582386#comment-13582386
 ] 

Kevin Wilfong commented on HIVE-3970:
-

Refreshed

> Clean up/fix PartitionNameWhitelistPreEventListener
> ---
>
> Key: HIVE-3970
> URL: https://issues.apache.org/jira/browse/HIVE-3970
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt, 
> HIVE-3970.3.patch.txt
>
>
> There are a number of issues and things which can be cleaned up related to 
> PartitionNameWhitelistPreEventListener.
> * It's an event listener, but it really doesn't need to be given that the 
> regex whitelist is configurable, it could just be a utility method.
> * It's not run when a partition is renamed, so partitions with invalid 
> characters can be created in this way.
> * There's no easy way to check if a partition contains invalid characters 
> before creating it and seeing if it fails.
> Most importantly, when a dynamic partition contains an invalid character, the 
> directory for this partition is created, and the data is moved into it, but 
> the partition fails to be created leaving an orphan directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-4040) fix ptf negative tests

2013-02-20 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4040.


Resolution: Fixed

Committed to branch. Thanks, Prajakta!

> fix ptf negative tests
> --
>
> Key: HIVE-4040
> URL: https://issues.apache.org/jira/browse/HIVE-4040
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Reporter: Harish Butani
>Assignee: Prajakta Kalmegh
>Priority: Minor
> Attachments: HIVE-4040.1.patch.txt
>
>
> fix queries in -ve tests to match language changes. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4043) Parallel Hive Queries: Sporadic Errors of form: Error in metadata: javax.jdo.JDOFatalDataStoreException: IO Error: Connection reset

2013-02-20 Thread Andrew Tindle (JIRA)

Andrew Tindle created HIVE-4043:
---

 Summary: Parallel Hive Queries: Sporadic Errors of form: Error in 
metadata: javax.jdo.JDOFatalDataStoreException: IO Error: Connection reset
 Key: HIVE-4043
 URL: https://issues.apache.org/jira/browse/HIVE-4043
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.9.0
 Environment: O/S: RHEL 6.3
Metastore: Oracle 11gR2
Reporter: Andrew Tindle


I have a program that spawns Hive queries/processes, up to a maximum of 5, in 
parallel. When the number of queries drops below. ie the process has ended, 
another Hive query/process is initiated.

Sometimes, this program works, i.e. all 34 queries successfully process.

However, on other occasions, I get sporadic instances of the following error 
for some of the queries:

FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: IO Error: 
Connection reset
NestedThrowables:
java.sql.SQLRecoverableException: IO Error: Connection reset
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

Can anyone help in identifying/resolving why this occurs. It looks to me as if 
there is some kind of race condition/collision with the Hive Metastore, this 
being hosted in an Oracle DB on the same node as the Hadoop infrastructure 
(single node).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4044) Add URL type

2013-02-20 Thread Samuel Yuan (JIRA)

Samuel Yuan created HIVE-4044:
-

 Summary: Add URL type
 Key: HIVE-4044
 URL: https://issues.apache.org/jira/browse/HIVE-4044
 Project: Hive
  Issue Type: Improvement
Reporter: Samuel Yuan
Assignee: Samuel Yuan


Having a separate type for URLs would enable improvements in storage efficiency 
based on breaking up a URL into its components. The new type will be named 
"URL" and made a non-reserved keyword (see HIVE-701).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4005) Column truncation

2013-02-20 Thread Kevin Wilfong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-4005:


Attachment: HIVE-4005.3.patch.txt

> Column truncation
> -
>
> Key: HIVE-4005
> URL: https://issues.apache.org/jira/browse/HIVE-4005
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-4005.1.patch.txt, HIVE-4005.2.patch.txt, 
> HIVE-4005.3.patch.txt
>
>
> Column truncation allows users to remove data for columns that are no longer 
> useful.
> This is done by removing the data for the column and setting the length of 
> the column data and related lengths to 0 in the RC file header.
> RC file was fixed to recognize columns with lengths of zero to be empty and 
> are treated as if the column doesn't exist in the data, a null is returned 
> for every value of that column in every row. This is the same thing that 
> happens when more columns are selected than exist in the file.
> A new command was added to the CLI
> TRUNCATE TABLE ... PARTITION ... COLUMNS ...
> This launches a map only job where each mapper rewrites a single file without 
> the unnecessary column data and the adjusted headers. It does not 
> uncompress/deserialize the data so it is much faster than rewriting the data 
> with NULLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4005) Column truncation

2013-02-20 Thread Kevin Wilfong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582413#comment-13582413
 ] 

Kevin Wilfong commented on HIVE-4005:
-

Updated

> Column truncation
> -
>
> Key: HIVE-4005
> URL: https://issues.apache.org/jira/browse/HIVE-4005
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-4005.1.patch.txt, HIVE-4005.2.patch.txt, 
> HIVE-4005.3.patch.txt
>
>
> Column truncation allows users to remove data for columns that are no longer 
> useful.
> This is done by removing the data for the column and setting the length of 
> the column data and related lengths to 0 in the RC file header.
> RC file was fixed to recognize columns with lengths of zero to be empty and 
> are treated as if the column doesn't exist in the data, a null is returned 
> for every value of that column in every row. This is the same thing that 
> happens when more columns are selected than exist in the file.
> A new command was added to the CLI
> TRUNCATE TABLE ... PARTITION ... COLUMNS ...
> This launches a map only job where each mapper rewrites a single file without 
> the unnecessary column data and the adjusted headers. It does not 
> uncompress/deserialize the data so it is much faster than rewriting the data 
> with NULLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4005) Column truncation

2013-02-20 Thread Kevin Wilfong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-4005:


Status: Patch Available  (was: Open)

> Column truncation
> -
>
> Key: HIVE-4005
> URL: https://issues.apache.org/jira/browse/HIVE-4005
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-4005.1.patch.txt, HIVE-4005.2.patch.txt, 
> HIVE-4005.3.patch.txt
>
>
> Column truncation allows users to remove data for columns that are no longer 
> useful.
> This is done by removing the data for the column and setting the length of 
> the column data and related lengths to 0 in the RC file header.
> RC file was fixed to recognize columns with lengths of zero to be empty and 
> are treated as if the column doesn't exist in the data, a null is returned 
> for every value of that column in every row. This is the same thing that 
> happens when more columns are selected than exist in the file.
> A new command was added to the CLI
> TRUNCATE TABLE ... PARTITION ... COLUMNS ...
> This launches a map only job where each mapper rewrites a single file without 
> the unnecessary column data and the adjusted headers. It does not 
> uncompress/deserialize the data so it is much faster than rewriting the data 
> with NULLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.

2013-02-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582423#comment-13582423
 ] 

Hudson commented on HIVE-4039:
--

Integrated in Hive-trunk-hadoop2 #130 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/130/])
HIVE-4039 Hive compiler sometimes fails in semantic analysis / optimisation 
stage when boolean
variable appears in WHERE clause. (Jezn Xu via namit) (Revision 1448135)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448135
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/test_boolean_whereclause.q
* /hive/trunk/ql/src/test/results/clientpositive/test_boolean_whereclause.q.out


> Hive compiler sometimes fails in semantic analysis / optimisation stage when 
> boolean variable appears in WHERE clause.
> --
>
> Key: HIVE-4039
> URL: https://issues.apache.org/jira/browse/HIVE-4039
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jean Xu
>Assignee: Jean Xu
>Priority: Minor
> Attachments: HIVE_4039.1.patch.txt
>
>
> Hive compiler fails with a NullPointerException in semantic analysis / 
> optimisation stage when a boolean variable appears in the WHERE clause in 
> some cases. A minimal query to generate this error is here:
> SELECT 1
> FROM (
> SELECT TRUE AS flag
> FROM dim_one_row:measurementsystems
> ) a
> WHERE flag;
> On the other hand, the following query is perfectly fine:
> SELECT 1
> FROM (
> SELECT TRUE AS flag
> FROM dim_one_row:measurementsystems
> ) a
> WHERE flag=TRUE;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4027) Thrift alter_table api doesnt validate column type

2013-02-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582424#comment-13582424
 ] 

Hudson commented on HIVE-4027:
--

Integrated in Hive-trunk-hadoop2 #130 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/130/])
HIVE-4027 Thrift alter_table api doesnt validate column type
(Gang Tim Liu via namit) (Revision 1448138)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448138
Files : 
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java


> Thrift alter_table api doesnt validate column type
> --
>
> Key: HIVE-4027
> URL: https://issues.apache.org/jira/browse/HIVE-4027
> Project: Hive
>  Issue Type: Bug
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
> Fix For: 0.11.0
>
> Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3
>
>
> Thrift alter_table api doesnt validate column type so that invalid column 
> type can sneak it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails

2013-02-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582425#comment-13582425
 ] 

Hudson commented on HIVE-4004:
--

Integrated in Hive-trunk-hadoop2 #130 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/130/])
HIVE-4004 Incorrect status for AddPartition metastore event if RawStore 
commit fails
(Dilip Joseph via namit) (Revision 1448101)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448101
Files : 
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListenerOnlyOnCommit.java


> Incorrect status for AddPartition metastore event if RawStore commit fails
> --
>
> Key: HIVE-4004
> URL: https://issues.apache.org/jira/browse/HIVE-4004
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.10.0
>Reporter: Dilip Joseph
>Assignee: Dilip Joseph
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: HIVE-4004.1.patch.txt
>
>
> For ADD PARTITION operations, the AddPartitionEvent does not care if the 
> RawStore commit succeeded or not.  This means that an AddPartitionEvent with 
> status=true is fired even if the the actual ADD PARTITION operation failed.  
> This will confuse any AddPartitionEvent listeners.
> Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the 
> status of the RawStore commit.  Only AddPartitionEvent has this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Hive-trunk-hadoop2 - Build # 130 - Still Failing

2013-02-20 Thread Apache Jenkins Server

Changes for Build #98

Changes for Build #99
[kevinwilfong] HIVE-3940. Track columns accessed in each table in a query. 
(Samuel Yuan via kevinwilfong)


Changes for Build #100
[namit] HIVE-3778 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
(Gang Tim Liu via namit)


Changes for Build #101

Changes for Build #102

Changes for Build #103

Changes for Build #104
[hashutosh] HIVE-3977 : Hive 0.10 postgres schema script is broken (Johnny 
Zhang via Ashutosh Chauhan)

[hashutosh] HIVE-3932 : Hive release tarballs don't contain PostgreSQL 
metastore scripts (Mark Grover via Ashutosh Chauhan)


Changes for Build #105
[hashutosh] HIVE-3918 : Normalize more CRLF line endings (Mark Grover via 
Ashutosh Chauhan)

[namit] HIVE-3917 Support noscan operation for analyze command
(Gang Tim Liu via namit)


Changes for Build #106
[namit] HIVE-3937 Hive Profiler
(Pamela Vagata via namit)

[hashutosh] HIVE-3571 : add a way to run a small unit quickly (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-3956 : TestMetaStoreAuthorization always uses the same port 
(Navis via Ashutosh Chauhan)


Changes for Build #107

Changes for Build #108

Changes for Build #109

Changes for Build #110
[namit] HIVE-2839 Filters on outer join with mapjoin hint is not applied 
correctly
(Navis via namit)


Changes for Build #111

Changes for Build #112
[namit] HIVE-3998 Oracle metastore update script will fail when upgrading from 
0.9.0 to
0.10.0 (Jarek and Mark via namit)

[namit] HIVE-3999 Mysql metastore upgrade script will end up with different 
schema than
the full schema load (Jarek and Mark via namit)


Changes for Build #113

Changes for Build #114
[namit] HIVE-3995 PostgreSQL upgrade scripts are not valid
(Jarek and Mark via namit)


Changes for Build #115

Changes for Build #116
[namit] HIVE-4001 Add o.a.h.h.serde.Constants for backward compatibility
(Navis via namit)


Changes for Build #117

Changes for Build #118

Changes for Build #119

Changes for Build #120
[kevinwilfong] HIVE-3252. Add environment context to metastore Thrift calls. 
(Samuel Yuan via kevinwilfong)


Changes for Build #121

Changes for Build #122

Changes for Build #123

Changes for Build #124

Changes for Build #125

Changes for Build #126
[hashutosh] HIVE-4000 Hive client goes into infinite loop at 100% cpu (Owen 
Omalley via Ashutosh Chauhan)


Changes for Build #127
[namit] HIVE-4021 PostgreSQL upgrade scripts are creating column with incorrect 
name
(Jarek Jarcec Cecho via namit)

[hashutosh] HIVE-4033 : NPE at runtime while selecting virtual column after 
joining three tables on different keys (Ashutosh Chauhan)

[namit] HIVE-4029 Hive Profiler dies with NPE
(Brock Noland via namit)


Changes for Build #128
[namit] HIVE-4023 Improve Error Logging in MetaStore
(Bhushan Mandhani via namit)

[namit] HIVE-3403 user should not specify mapjoin to perform sort-merge 
bucketed join
(Namit Jain via Ashutosh)

[namit] HIVE-4024 Derby metastore update script will fail when upgrading from 
0.9.0
to 0.10.0 (Jarek Jarcec Cecho via namit)


Changes for Build #129

Changes for Build #130
[namit] HIVE-4027 Thrift alter_table api doesnt validate column type
(Gang Tim Liu via namit)

[namit] HIVE-4039 Hive compiler sometimes fails in semantic analysis / 
optimisation stage when boolean
variable appears in WHERE clause. (Jezn Xu via namit)

[namit] HIVE-4004 Incorrect status for AddPartition metastore event if RawStore 
commit fails
(Dilip Joseph via namit)




34 tests failed.
FAILED:  
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_1

Error Message:
Unexpected exception
See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get 
more logs.

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception
See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get 
more logs.
at junit.framework.Assert.fail(Assert.java:50)
at 
org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:5855)
at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_1(TestCliDriver.java:3476)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at 
org.apache.tools.ant.taskdefs.opti

[jira] [Created] (HIVE-4045) Modify PreDropPartitionEvent to pass Table parameter

2013-02-20 Thread Li Yang (JIRA)

Li Yang created HIVE-4045:
-

 Summary: Modify PreDropPartitionEvent to pass Table parameter
 Key: HIVE-4045
 URL: https://issues.apache.org/jira/browse/HIVE-4045
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Li Yang
Priority: Minor


MetaStorePreEventListener which implements onEvent(PreEventContext context) 
sometimes needs to access Table properties when PreDropPartitionEvent is 
listened to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #71

2013-02-20 Thread Apache Jenkins Server

See 

--
[...truncated 62804 lines...]
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2013-02-20 13:52:43,989 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] Execution completed successfully
[junit] Mapred Local Task Succeeded . Convert the Join into MapJoin
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 

[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.testhivedrivertable
[junit] Table default.testhivedrivertable stats: [num_partitions: 0, 
num_files: 1, num_rows: 0, total_size: 5812, raw_data_size: 0]
[junit] POSTHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 

[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 

[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivert

Re: Review Request: HIVE-3951: Allow Decimal type columns in Regex Serde

2013-02-20 Thread Jarek Cecho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9173/#review16799
---

Ship it!


Looks good to me (I'm not a committer).

- Jarek Cecho


On Jan. 31, 2013, 8:02 a.m., Mark Grover wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9173/
> ---
> 
> (Updated Jan. 31, 2013, 8:02 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> ---
> 
> Add support for RegexSerde to support newly added Decimal type
> 
> 
> This addresses bug HVIE-3951.
> https://issues.apache.org/jira/browse/HVIE-3951
> 
> 
> Diffs
> -
> 
>   ql/src/test/queries/clientpositive/serde_regex.q c3254ca 
>   ql/src/test/results/clientpositive/serde_regex.q.out a933538 
>   serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java ae7693a 
> 
> Diff: https://reviews.apache.org/r/9173/diff/
> 
> 
> Testing
> ---
> 
> Added a client positive test
> 
> 
> Thanks,
> 
> Mark Grover
> 
>

[jira] [Commented] (HIVE-3951) Allow Decimal type columns in Regex Serde

2013-02-20 Thread Jarek Jarcec Cecho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582587#comment-13582587
 ] 

Jarek Jarcec Cecho commented on HIVE-3951:
--

+1 (non-binding)

> Allow Decimal type columns in Regex Serde
> -
>
> Key: HIVE-3951
> URL: https://issues.apache.org/jira/browse/HIVE-3951
> Project: Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: Mark Grover
>Assignee: Mark Grover
> Fix For: 0.11.0
>
> Attachments: HIVE-3951.1.patch
>
>
> Decimal type in Hive was recently added by HIVE-2693. We should allow users 
> to create tables with decimal type columns when using Regex Serde. 
> HIVE-3004 did something similar for other primitive types.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join

2013-02-20 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-3996:
-

Attachment: HIVE-3996_3.patch

Added a test case that demonstrates the issue when combining map-joins. This is 
an almost exact replica of the join32.q test with the size altered but, current 
code would generate the same plan as join32.q when the sum of the sizes of the 
tables would exceed the size configured by noConditionalTask.size.

> Correctly enforce the memory limit on the multi-table map-join
> --
>
> Key: HIVE-3996
> URL: https://issues.apache.org/jira/browse/HIVE-3996
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-3996_2.patch, HIVE-3996_3.patch, HIVE-3996.patch
>
>
> Currently with HIVE-3784, the joins are converted to map-joins based on 
> checks of the table size against the config variable: 
> hive.auto.convert.join.noconditionaltask.size. 
> However, the current implementation will also merge multiple mapjoin 
> operators into a single task regardless of whether the sum of the table sizes 
> will exceed the configured value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join

2013-02-20 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-3996:
-

Status: Patch Available  (was: Open)

> Correctly enforce the memory limit on the multi-table map-join
> --
>
> Key: HIVE-3996
> URL: https://issues.apache.org/jira/browse/HIVE-3996
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-3996_2.patch, HIVE-3996_3.patch, HIVE-3996.patch
>
>
> Currently with HIVE-3784, the joins are converted to map-joins based on 
> checks of the table size against the config variable: 
> hive.auto.convert.join.noconditionaltask.size. 
> However, the current implementation will also merge multiple mapjoin 
> operators into a single task regardless of whether the sum of the table sizes 
> will exceed the configured value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4046) Column masking

2013-02-20 Thread Samuel Yuan (JIRA)

Samuel Yuan created HIVE-4046:
-

 Summary: Column masking
 Key: HIVE-4046
 URL: https://issues.apache.org/jira/browse/HIVE-4046
 Project: Hive
  Issue Type: New Feature
  Components: CLI, Metastore, Query Processor
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan


Sometimes data in a table needs to be kept around but made inaccessible. Right 
now it is possible to offline a table or a partition, but not a specific column 
of a partition. Also, accessing an offlined table results in an error. With 
this change, it will be possible to mask a column at the partition level, 
causing all further queries to that column to return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Requests

2013-02-20 Thread kulkarni.swar...@gmail.com

Would someone have a chance to take a quick look at these review
requests[1][2].

[1] https://reviews.apache.org/r/9275/
[2] https://reviews.apache.org/r/9276/

Thanks,


On Tue, Feb 5, 2013 at 10:00 AM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> Thanks Mark. Appreciate that. I'll take a look.
>
>
> On Mon, Feb 4, 2013 at 10:23 PM, Mark Grover 
> wrote:
>
>> Swarnim,
>> I left some comments on  reviewboard.
>>
>> On Mon, Feb 4, 2013 at 8:00 AM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>> > Hello,
>> >
>> > I opened up two reviews for small issues, HIVE-3553[1] and
>> HIVE-3725[2]. If
>> > you guys get a chance to review and provide feedback on it, I will
>> really
>> > appreciate.
>> >
>> > Thanks,
>> >
>> > [1] https://reviews.apache.org/r/9275/
>> > [2] https://reviews.apache.org/r/9276/
>> >
>> > --
>> > Swarnim
>> >
>>
>
>
>
> --
> Swarnim
>



-- 
Swarnim

[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2013-02-20 Thread Joey Echeverria (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582617#comment-13582617
 ] 

Joey Echeverria commented on HIVE-3528:
---

Hey Michael,

As a work around, did you try casting the null to the type of the column that 
you're inserting into? It's not ideal, but might be a workable interim solution.

-Joey

> Avro SerDe doesn't handle serializing Nullable types that require access to a 
> Schema
> 
>
> Key: HIVE-3528
> URL: https://issues.apache.org/jira/browse/HIVE-3528
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: avro
> Fix For: 0.11.0
>
> Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt
>
>
> Deserialization properly handles hiding Nullable Avro types, including 
> complex types like record, map, array, etc. However, when Serialization 
> attempts to write out these types it erroneously makes use of the UNION 
> schema that contains NULL and the other type.
> This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
> Bytes.
> Here's a [review board of unit tests that express the 
> problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
> case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3911) udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is disabled.

2013-02-20 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-3911:
---

Attachment: HIVE-3911_branch10.patch

Attaching HIVE-3911_branch10.patch. This should make it consistent. I have just 
removed the queries that cause changes and fails this test.

> udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is 
> disabled.
> -
>
> Key: HIVE-3911
> URL: https://issues.apache.org/jira/browse/HIVE-3911
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Thiruvel Thirumoolan
> Fix For: 0.11.0
>
> Attachments: HIVE-3911_branch10.patch, HIVE-3911.patch
>
>
> I am running Hive10 unit tests against Hadoop 0.23.5 and 
> udaf_percentile_approx.q fails with a different value when map-side aggr is 
> disabled and only when 3rd argument to this UDAF is 100. Matches expected 
> output when map-side aggr is enabled for the same arguments.
> This test passes when hadoop.version is 1.1.1 and fails when its 0.23.x or 
> 2.0.0-alpha or 2.0.2-alpha.
> [junit] 20c20
> [junit] < 254.083331
> [junit] ---
> [junit] > 252.77
> [junit] 47c47
> [junit] < 254.083331
> [junit] ---
> [junit] > 252.77
> [junit] 74c74
> [junit] < 
> [23.358,254.083331,477.0625,489.54667]
> [junit] ---
> [junit] > [24.07,252.77,476.9,487.82]
> [junit] 101c101
> [junit] < 
> [23.358,254.083331,477.0625,489.54667]
> [junit] ---
> [junit] > [24.07,252.77,476.9,487.82]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3911) udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is disabled.

2013-02-20 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-3911:
---

Fix Version/s: 0.10.1
 Assignee: Thiruvel Thirumoolan

> udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is 
> disabled.
> -
>
> Key: HIVE-3911
> URL: https://issues.apache.org/jira/browse/HIVE-3911
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
> Fix For: 0.11.0, 0.10.1
>
> Attachments: HIVE-3911_branch10.patch, HIVE-3911.patch
>
>
> I am running Hive10 unit tests against Hadoop 0.23.5 and 
> udaf_percentile_approx.q fails with a different value when map-side aggr is 
> disabled and only when 3rd argument to this UDAF is 100. Matches expected 
> output when map-side aggr is enabled for the same arguments.
> This test passes when hadoop.version is 1.1.1 and fails when its 0.23.x or 
> 2.0.0-alpha or 2.0.2-alpha.
> [junit] 20c20
> [junit] < 254.083331
> [junit] ---
> [junit] > 252.77
> [junit] 47c47
> [junit] < 254.083331
> [junit] ---
> [junit] > 252.77
> [junit] 74c74
> [junit] < 
> [23.358,254.083331,477.0625,489.54667]
> [junit] ---
> [junit] > [24.07,252.77,476.9,487.82]
> [junit] 101c101
> [junit] < 
> [23.358,254.083331,477.0625,489.54667]
> [junit] ---
> [junit] > [24.07,252.77,476.9,487.82]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582637#comment-13582637
 ] 

Gang Tim Liu commented on HIVE-3741:


https://reviews.facebook.net/D8715

> Driver.validateConfVariables() should perform more validations
> --
>
> Key: HIVE-3741
> URL: https://issues.apache.org/jira/browse/HIVE-3741
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
>
> Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3741:
---

Attachment: HIVE-3741.patch.1

> Driver.validateConfVariables() should perform more validations
> --
>
> Key: HIVE-3741
> URL: https://issues.apache.org/jira/browse/HIVE-3741
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3741.patch.1
>
>
> Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3741 started by Gang Tim Liu.

> Driver.validateConfVariables() should perform more validations
> --
>
> Key: HIVE-3741
> URL: https://issues.apache.org/jira/browse/HIVE-3741
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3741.patch.1
>
>
> Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3741:
---

Status: Patch Available  (was: In Progress)

patch is available for review.

> Driver.validateConfVariables() should perform more validations
> --
>
> Key: HIVE-3741
> URL: https://issues.apache.org/jira/browse/HIVE-3741
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3741.patch.1
>
>
> Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4046) Column masking

2013-02-20 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-4046:
-

Component/s: Security
 Authorization

> Column masking
> --
>
> Key: HIVE-4046
> URL: https://issues.apache.org/jira/browse/HIVE-4046
> Project: Hive
>  Issue Type: New Feature
>  Components: Authorization, CLI, Metastore, Query Processor, Security
>Affects Versions: 0.11.0
>Reporter: Samuel Yuan
>Assignee: Samuel Yuan
>
> Sometimes data in a table needs to be kept around but made inaccessible. 
> Right now it is possible to offline a table or a partition, but not a 
> specific column of a partition. Also, accessing an offlined table results in 
> an error. With this change, it will be possible to mask a column at the 
> partition level, causing all further queries to that column to return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4046) Column masking

2013-02-20 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582648#comment-13582648
 ] 

Carl Steinbach commented on HIVE-4046:
--

I think it's possible to accomplish most of this functionality using views in 
combination with authorization.

I'm also concerned that with the proposed behavior users will have trouble 
differentiating between the case where they aren't allowed to read a column and 
the other case where they do have permission to read the column, but all of the 
values are actually NULL.

> Column masking
> --
>
> Key: HIVE-4046
> URL: https://issues.apache.org/jira/browse/HIVE-4046
> Project: Hive
>  Issue Type: New Feature
>  Components: Authorization, CLI, Metastore, Query Processor, Security
>Affects Versions: 0.11.0
>Reporter: Samuel Yuan
>Assignee: Samuel Yuan
>
> Sometimes data in a table needs to be kept around but made inaccessible. 
> Right now it is possible to offline a table or a partition, but not a 
> specific column of a partition. Also, accessing an offlined table results in 
> an error. With this change, it will be possible to mask a column at the 
> partition level, causing all further queries to that column to return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3720) Expand and standardize authorization in Hive

2013-02-20 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-3720:
-

Component/s: Security

> Expand and standardize authorization in Hive
> 
>
> Key: HIVE-3720
> URL: https://issues.apache.org/jira/browse/HIVE-3720
> Project: Hive
>  Issue Type: Improvement
>  Components: Authorization, Security
>Affects Versions: 0.9.0
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
> Attachments: Hive_Authorization_Functionality.pdf
>
>
> The existing implementation of authorization in Hive is not complete. 
> Additionally the existing implementation has security holes. This JIRA is an 
> umbrella JIRA  for a) extending authorization to all SQL operations and 
> direct metadata operations, and b) standardizing the authorization model and 
> its semantics to mirror that of MySQL as closely as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4022) Structs and struct fields cannot be NULL in INSERT statements

2013-02-20 Thread Michael Malak (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582662#comment-13582662
 ] 

Michael Malak commented on HIVE-4022:
-

Note that there is a workaround for the case of setting STRUCT fields to NULL, 
but not for setting the whole STRUCT to a NULL.

The following workaround does work:

INSERT INT TABLE oc SELECT named_struct('a', cast(null as int), 'b', cast(null 
as int)) FROM tc;

But there is no equivalent workaround to casting the whole STRUCT to NULL, as 
noted in the first comment of https://issues.apache.org/jira/browse/HIVE-1287

> Structs and struct fields cannot be NULL in INSERT statements
> -
>
> Key: HIVE-4022
> URL: https://issues.apache.org/jira/browse/HIVE-4022
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Michael Malak
>
> Originally thought to be Avro-specific, and first noted with respect to 
> HIVE-3528 "Avro SerDe doesn't handle serializing Nullable types that require 
> access to a Schema", it turns out even native Hive tables cannot store NULL 
> in a STRUCT field or for the entire STRUCT itself, at least when the NULL is 
> specified directly in the INSERT statement.
> Again, this affects both Avro-backed tables and native Hive tables.
> ***For native Hive tables:
> The following:
> echo 1,2 >twovalues.csv
> hive
> CREATE TABLE tc (x INT, y INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> LOAD DATA LOCAL INPATH 'twovalues.csv' INTO TABLE tc;
> CREATE TABLE oc (z STRUCT);
> INSERT INTO TABLE oc SELECT null FROM tc;
> produces the error
> FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target 
> table because column number/types are different 'oc': Cannot convert column 0 
> from void to struct.
> The following:
> INSERT INTO TABLE oc SELECT named_struct('a', null, 'b', null) FROM tc;
> produces the error:
> FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target 
> table because column number/types are different 'oc': Cannot convert column 0 
> from struct to struct.
> ***For Avro:
> In HIVE-3528, there is in fact a null-struct test case in line 14 of
> https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt
> The test script at
> https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q
> does indeed work.  But in that test, the query gets all of its data from a 
> test table verbatim:
> INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;
> If instead we stick in a hard-coded null for the struct directly into the 
> query, it fails:
> INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, 
> bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, 
> bytes1, fixed1 FROM test_serializer;
> with the following error:
> FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target 
> table because column number/types are different 'as_avro': Cannot convert 
> column 10 from void to struct.
> Note, though, that substituting a hard-coded null for string1 (and restoring 
> struct1 into the query) does work:
> INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, 
> bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, 
> bytes1, fixed1 FROM test_serializer;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2013-02-20 Thread Michael Malak (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582664#comment-13582664
 ] 

Michael Malak commented on HIVE-3528:
-

As noted in the first comment from 
https://issues.apache.org/jira/browse/HIVE-1287, casting to a STRUCT is not 
currently supported.

However, I did just now try casting individual fields of a STRUCT and that 
indeed does work.

I just now added details to the JIRA that I created last week.
https://issues.apache.org/jira/browse/HIVE-4022


> Avro SerDe doesn't handle serializing Nullable types that require access to a 
> Schema
> 
>
> Key: HIVE-3528
> URL: https://issues.apache.org/jira/browse/HIVE-3528
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: avro
> Fix For: 0.11.0
>
> Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt
>
>
> Deserialization properly handles hiding Nullable Avro types, including 
> complex types like record, map, array, etc. However, when Serialization 
> attempts to write out these types it erroneously makes use of the UNION 
> schema that contains NULL and the other type.
> This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
> Bytes.
> Here's a [review board of unit tests that express the 
> problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
> case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.

2013-02-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582676#comment-13582676
 ] 

Hudson commented on HIVE-4039:
--

Integrated in Hive-trunk-h0.21 #1978 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1978/])
HIVE-4039 Hive compiler sometimes fails in semantic analysis / optimisation 
stage when boolean
variable appears in WHERE clause. (Jezn Xu via namit) (Revision 1448135)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448135
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/test_boolean_whereclause.q
* /hive/trunk/ql/src/test/results/clientpositive/test_boolean_whereclause.q.out


> Hive compiler sometimes fails in semantic analysis / optimisation stage when 
> boolean variable appears in WHERE clause.
> --
>
> Key: HIVE-4039
> URL: https://issues.apache.org/jira/browse/HIVE-4039
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jean Xu
>Assignee: Jean Xu
>Priority: Minor
> Attachments: HIVE_4039.1.patch.txt
>
>
> Hive compiler fails with a NullPointerException in semantic analysis / 
> optimisation stage when a boolean variable appears in the WHERE clause in 
> some cases. A minimal query to generate this error is here:
> SELECT 1
> FROM (
> SELECT TRUE AS flag
> FROM dim_one_row:measurementsystems
> ) a
> WHERE flag;
> On the other hand, the following query is perfectly fine:
> SELECT 1
> FROM (
> SELECT TRUE AS flag
> FROM dim_one_row:measurementsystems
> ) a
> WHERE flag=TRUE;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4027) Thrift alter_table api doesnt validate column type

2013-02-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582677#comment-13582677
 ] 

Hudson commented on HIVE-4027:
--

Integrated in Hive-trunk-h0.21 #1978 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1978/])
HIVE-4027 Thrift alter_table api doesnt validate column type
(Gang Tim Liu via namit) (Revision 1448138)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448138
Files : 
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java


> Thrift alter_table api doesnt validate column type
> --
>
> Key: HIVE-4027
> URL: https://issues.apache.org/jira/browse/HIVE-4027
> Project: Hive
>  Issue Type: Bug
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
> Fix For: 0.11.0
>
> Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3
>
>
> Thrift alter_table api doesnt validate column type so that invalid column 
> type can sneak it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails

2013-02-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582678#comment-13582678
 ] 

Hudson commented on HIVE-4004:
--

Integrated in Hive-trunk-h0.21 #1978 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1978/])
HIVE-4004 Incorrect status for AddPartition metastore event if RawStore 
commit fails
(Dilip Joseph via namit) (Revision 1448101)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448101
Files : 
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListenerOnlyOnCommit.java


> Incorrect status for AddPartition metastore event if RawStore commit fails
> --
>
> Key: HIVE-4004
> URL: https://issues.apache.org/jira/browse/HIVE-4004
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.10.0
>Reporter: Dilip Joseph
>Assignee: Dilip Joseph
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: HIVE-4004.1.patch.txt
>
>
> For ADD PARTITION operations, the AddPartitionEvent does not care if the 
> RawStore commit succeeded or not.  This means that an AddPartitionEvent with 
> status=true is fired even if the the actual ADD PARTITION operation failed.  
> This will confuse any AddPartitionEvent listeners.
> Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the 
> status of the RawStore commit.  Only AddPartitionEvent has this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Hive-trunk-h0.21 - Build # 1978 - Fixed

2013-02-20 Thread Apache Jenkins Server

Changes for Build #1975
[namit] HIVE-4021 PostgreSQL upgrade scripts are creating column with incorrect 
name
(Jarek Jarcec Cecho via namit)

[hashutosh] HIVE-4033 : NPE at runtime while selecting virtual column after 
joining three tables on different keys (Ashutosh Chauhan)

[namit] HIVE-4029 Hive Profiler dies with NPE
(Brock Noland via namit)


Changes for Build #1976
[namit] HIVE-4023 Improve Error Logging in MetaStore
(Bhushan Mandhani via namit)

[namit] HIVE-3403 user should not specify mapjoin to perform sort-merge 
bucketed join
(Namit Jain via Ashutosh)

[namit] HIVE-4024 Derby metastore update script will fail when upgrading from 
0.9.0
to 0.10.0 (Jarek Jarcec Cecho via namit)


Changes for Build #1977

Changes for Build #1978
[namit] HIVE-4027 Thrift alter_table api doesnt validate column type
(Gang Tim Liu via namit)

[namit] HIVE-4039 Hive compiler sometimes fails in semantic analysis / 
optimisation stage when boolean
variable appears in WHERE clause. (Jezn Xu via namit)

[namit] HIVE-4004 Incorrect status for AddPartition metastore event if RawStore 
commit fails
(Dilip Joseph via namit)




All tests passed

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1978)

Status: Fixed

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1978/ to 
view the results.

[jira] [Work started] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3710 started by Gang Tim Liu.

> HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
> FileSinkOperator
> --
>
> Key: HIVE-3710
> URL: https://issues.apache.org/jira/browse/HIVE-3710
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
>
> It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3710:
---

Attachment: HIVE-3710.patch.1

> HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
> FileSinkOperator
> --
>
> Key: HIVE-3710
> URL: https://issues.apache.org/jira/browse/HIVE-3710
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3710.patch.1
>
>
> It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582706#comment-13582706
 ] 

Gang Tim Liu commented on HIVE-3710:


https://reviews.facebook.net/D8721

> HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
> FileSinkOperator
> --
>
> Key: HIVE-3710
> URL: https://issues.apache.org/jira/browse/HIVE-3710
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3710.patch.1
>
>
> It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3710:
---

Status: Patch Available  (was: In Progress)

patch is available.

> HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
> FileSinkOperator
> --
>
> Key: HIVE-3710
> URL: https://issues.apache.org/jira/browse/HIVE-3710
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3710.patch.1
>
>
> It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3992) Hive RCFile::sync(long) does a sub-sequence linear search for sync blocks

2013-02-20 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-3992:
--

Assignee: Gopal V
Release Note: Rely on previous sync-points when syncing within the same 
RCFile and avoid unnecessary I/O
  Status: Patch Available  (was: Open)

Patch optimizes for rcfile splits when they are being merged in a 
CombineFileSplit instance.

> Hive RCFile::sync(long) does a sub-sequence linear search for sync blocks
> -
>
> Key: HIVE-3992
> URL: https://issues.apache.org/jira/browse/HIVE-3992
> Project: Hive
>  Issue Type: Bug
> Environment: Ubuntu x86_64/java-1.6/hadoop-2.0.3
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-3992.patch, select-join-limit.html
>
>
> The following function does some bad I/O
> {code}
> public synchronized void sync(long position) throws IOException {
>   ...
>   try {
> seek(position + 4); // skip escape
> in.readFully(syncCheck);
> int syncLen = sync.length;
> for (int i = 0; in.getPos() < end; i++) {
>   int j = 0;
>   for (; j < syncLen; j++) {
> if (sync[j] != syncCheck[(i + j) % syncLen]) {
>   break;
> }
>   }
>   if (j == syncLen) {
> in.seek(in.getPos() - SYNC_SIZE); // position before
> // sync
> return;
>   }
>   syncCheck[i % syncLen] = in.readByte();
> }
>   }
> ...
> }
> {code}
> This causes a rather large number of readByte() calls which are passed onto a 
> ByteBuffer via a single byte array.
> This results in rather a large amount of CPU being burnt in a the linear 
> search for the sync pattern in the input RCFile (upto 92% for a skewed 
> example - a trivial map-join + limit 100).
> This behaviour should be avoided at best or at least replaced by a rolling 
> hash for efficient comparison, since it has a known byte-width of 16 bytes.
> Attached the stack trace from a Yourkit profile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4046) Column masking

2013-02-20 Thread Justin Boseant (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582759#comment-13582759
 ] 

Justin Boseant commented on HIVE-4046:
--

The problem with using authorization is that querying one of these columns is 
going to result in an error / failed query.  The requested functionality 
requires that we succeed the query and mask the data.

> Column masking
> --
>
> Key: HIVE-4046
> URL: https://issues.apache.org/jira/browse/HIVE-4046
> Project: Hive
>  Issue Type: New Feature
>  Components: Authorization, CLI, Metastore, Query Processor, Security
>Affects Versions: 0.11.0
>Reporter: Samuel Yuan
>Assignee: Samuel Yuan
>
> Sometimes data in a table needs to be kept around but made inaccessible. 
> Right now it is possible to offline a table or a partition, but not a 
> specific column of a partition. Also, accessing an offlined table results in 
> an error. With this change, it will be possible to mask a column at the 
> partition level, causing all further queries to that column to return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4046) Column masking

2013-02-20 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582772#comment-13582772
 ] 

Carl Steinbach commented on HIVE-4046:
--

Here's what I meant:

{code}
CREATE TABLE emp (
  name STRING,
  title STRING,
  salary INT
);

CREATE VIEW emp_masked AS
  SELECT name, title, NULL
  FROM emp;
{code}

Then use authorization to restrict access to the underlying emp table.

Regardless of which approach is used, I think it would be good to write up a 
proposal explaining the functional and implementation details before writing 
any code.

> Column masking
> --
>
> Key: HIVE-4046
> URL: https://issues.apache.org/jira/browse/HIVE-4046
> Project: Hive
>  Issue Type: New Feature
>  Components: Authorization, CLI, Metastore, Query Processor, Security
>Affects Versions: 0.11.0
>Reporter: Samuel Yuan
>Assignee: Samuel Yuan
>
> Sometimes data in a table needs to be kept around but made inaccessible. 
> Right now it is possible to offline a table or a partition, but not a 
> specific column of a partition. Also, accessing an offlined table results in 
> an error. With this change, it will be possible to mask a column at the 
> partition level, causing all further queries to that column to return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-3710:
-

Status: Open  (was: Patch Available)

bq. It should be part of the plan instead.

Why should it be part of the plan? Is this patch intended to resolve incorrect 
behavior, or is it a performance optimization, or ...?

Please add a test case.


> HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
> FileSinkOperator
> --
>
> Key: HIVE-3710
> URL: https://issues.apache.org/jira/browse/HIVE-3710
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3710.patch.1
>
>
> It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582797#comment-13582797
 ] 

Gang Tim Liu commented on HIVE-3710:


It's follow up on HIVE-3706. should follow into performance optimization 
although not as big as HIVE-3706.

> HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
> FileSinkOperator
> --
>
> Key: HIVE-3710
> URL: https://issues.apache.org/jira/browse/HIVE-3710
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3710.patch.1
>
>
> It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582804#comment-13582804
 ] 

Gang Tim Liu commented on HIVE-3710:


It's not new feature but moving code from run-time path to compilation path in 
order to improve performance.

Thought existing statistics-related test cases have good coverage already.

Please let me know your thoughts and if it makes sense. I will act accordingly.

thanks a lot

> HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
> FileSinkOperator
> --
>
> Key: HIVE-3710
> URL: https://issues.apache.org/jira/browse/HIVE-3710
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3710.patch.1
>
>
> It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582809#comment-13582809
 ] 

Gang Tim Liu commented on HIVE-3710:


For example, stats0.q ... stats18.q are existing stats-related test cases.

thanks a lot

> HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
> FileSinkOperator
> --
>
> Key: HIVE-3710
> URL: https://issues.apache.org/jira/browse/HIVE-3710
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3710.patch.1
>
>
> It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4002) Fetch task aggregation for simple group by query

2013-02-20 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4002:
--

Attachment: HIVE-4002.D8739.1.patch

navis requested code review of "HIVE-4002 [jira] Fetch task aggregation for 
simple group by query".

Reviewers: JIRA

HIVE-4002 Fetch task aggregation for simple group by query

Aggregation queries with no group-by clause (for example, select count from 
src) executes final aggregation in single reduce task. But it's too small even 
for single reducer because the most of UDAF generates just single row for map 
aggregation. If final fetch task can aggregate outputs from map tasks, 
shuffling time can be removed.

This optimization transforms operator tree something like,

TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK

into

TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)

With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
min, before).

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D8739

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchAggregation.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
  ql/src/test/queries/clientpositive/fetch_aggregation.q
  ql/src/test/results/clientpositive/fetch_aggregation.q.out
  ql/src/test/results/compiler/plan/groupby1.q.xml
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml
  ql/src/test/results/compiler/plan/groupby5.q.xml
  serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/21291/

To: JIRA, navis


> Fetch task aggregation for simple group by query
> 
>
> Key: HIVE-4002
> URL: https://issues.apache.org/jira/browse/HIVE-4002
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-4002.D8739.1.patch
>
>
> Aggregation queries with no group-by clause (for example, select count(*) 
> from src) executes final aggregation in single reduce task. But it's too 
> small even for single reducer because the most of UDAF generates just single 
> row for map aggregation. If final fetch task can aggregate outputs from map 
> tasks, shuffling time can be removed.
> This optimization transforms operator tree something like,
> TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
> into 
> TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
> With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
> min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4002) Fetch task aggregation for simple group by query

2013-02-20 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4002:


Status: Patch Available  (was: Open)

> Fetch task aggregation for simple group by query
> 
>
> Key: HIVE-4002
> URL: https://issues.apache.org/jira/browse/HIVE-4002
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-4002.D8739.1.patch
>
>
> Aggregation queries with no group-by clause (for example, select count(*) 
> from src) executes final aggregation in single reduce task. But it's too 
> small even for single reducer because the most of UDAF generates just single 
> row for map aggregation. If final fetch task can aggregate outputs from map 
> tasks, shuffling time can be removed.
> This optimization transforms operator tree something like,
> TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
> into 
> TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
> With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
> min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-948) more query plan optimization rules

2013-02-20 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582880#comment-13582880
 ] 

Navis commented on HIVE-948:


Ah, sorry. I'l update that.

bq. Why this needs to be last optimizer?
It's not updating infos for the SEL including colExprMap, etc. Following 
optimizers like GlobalLimitOptimizer or SimpleFetchOptimizer does not  modify 
operator tree. (Possibly update infos, but I was even thinking of removing all 
of them as a CleanupProcessor, making the plan file smaller)

bq. Also, parent should always have child's schema, isnt it?
I thought SEL(no-compute) does not have schema because it just inherits that of 
parent. I'll check it again.

bq. Shouldn't parent be selectStar either when child is select-star or parent 
itself is select-star.
I've escaped those situations before applying it like this (in the missing 
file), cause I'm not sure of it.
{code}
if (pSEL.getConf().isSelStarNoCompute()) {
  // SEL(no-compute)-SEL. never seen this condition, and removing parent is not 
safe in current graph walker
  return null;
}
{code}

> more query plan optimization rules 
> ---
>
> Key: HIVE-948
> URL: https://issues.apache.org/jira/browse/HIVE-948
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Navis
> Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, 
> HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch
>
>
> Many query plans are not optimal in that they contain redundant operators. 
> Some examples are unnecessary select operators (select followed by select, 
> select output being the same as input etc.). Even though these operators are 
> not very expensive, they could account for around 10% of CPU time in some 
> simple queries. It seems they are low-hanging fruits that we should pick 
> first. 
> BTW, it seems these optimization rules should be added at the last stage of 
> the physical optimization phase since some redundant operators are added to 
> facilitate physical plan generation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-948) more query plan optimization rules

2013-02-20 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-948:
-

Attachment: HIVE-948.D8463.4.patch

navis updated the revision "HIVE-948 [jira] more query plan optimization rules".

  Added missing class, sorry

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D8463

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D8463?vs=27807&id=28257#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java

To: JIRA, ashutoshc, navis


> more query plan optimization rules 
> ---
>
> Key: HIVE-948
> URL: https://issues.apache.org/jira/browse/HIVE-948
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Navis
> Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, 
> HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.4.patch
>
>
> Many query plans are not optimal in that they contain redundant operators. 
> Some examples are unnecessary select operators (select followed by select, 
> select output being the same as input etc.). Even though these operators are 
> not very expensive, they could account for around 10% of CPU time in some 
> simple queries. It seems they are low-hanging fruits that we should pick 
> first. 
> BTW, it seems these optimization rules should be added at the last stage of 
> the physical optimization phase since some redundant operators are added to 
> facilitate physical plan generation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4025) Add reflect UDF for member method invocation of column

2013-02-20 Thread Phabricator (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582886#comment-13582886
 ] 

Phabricator commented on HIVE-4025:
---

navis has commented on the revision "HIVE-4025 [jira] Add reflect UDF for 
member method invocation of column".

INLINE COMMENTS
  ql/src/test/results/clientpositive/udf_reflect2.q.out:312 I'll update that.

  bq. The last columns seem to be wrong:
  It's right result for TimeStamp class.

  getYear()
   * Returns a value that is the result of subtracting 1900 from the
   * year that contains or begins with the instant in time represented
   * by this Date object, as interpreted in the local
   * time zone.

  getMonth()
   * Returns a number representing the month that contains or begins
   * with the instant in time represented by this Date object.
   * The value returned is between 0 and 11,
   * with the value 0 representing January.

  getDay()
   * Returns the day of the week represented by this date. The
   * returned value (0 = Sunday, 1 = Monday,
   * 2 = Tuesday, 3 = Wednesday, 4 =
   * Thursday, 5 = Friday, 6 = Saturday)
   * represents the day of the week that contains or begins with
   * the instant in time represented by this Date object,
   * as interpreted in the local time zone.

REVISION DETAIL
  https://reviews.facebook.net/D8601

To: JIRA, navis
Cc: njain, brock


> Add reflect UDF for member method invocation of column
> --
>
> Key: HIVE-4025
> URL: https://issues.apache.org/jira/browse/HIVE-4025
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-4025.D8601.1.patch
>
>
> There are many useful non-static methods on type of primitive types. But 
> current reflect UDF cannot invoke those. For example,
> select reflect2(value, "replace", "val", "VALUE") from src;
> which replaces 'val' part of value column with 'VALUE'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4025) Add reflect UDF for member method invocation of column

2013-02-20 Thread Phabricator (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582889#comment-13582889
 ] 

Phabricator commented on HIVE-4025:
---

navis has commented on the revision "HIVE-4025 [jira] Add reflect UDF for 
member method invocation of column".

  I've thought the name like "reflect_instance" or something, but found current 
"reflect" can invoke method of instance also(by calling default constructor of 
target class).

  Naming is hard. How about "type_reflect"?

REVISION DETAIL
  https://reviews.facebook.net/D8601

To: JIRA, navis
Cc: njain, brock


> Add reflect UDF for member method invocation of column
> --
>
> Key: HIVE-4025
> URL: https://issues.apache.org/jira/browse/HIVE-4025
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-4025.D8601.1.patch
>
>
> There are many useful non-static methods on type of primitive types. But 
> current reflect UDF cannot invoke those. For example,
> select reflect2(value, "replace", "val", "VALUE") from src;
> which replaces 'val' part of value column with 'VALUE'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-4045) Modify PreDropPartitionEvent to pass Table parameter

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-4045:


Assignee: Li Yang

> Modify PreDropPartitionEvent to pass Table parameter
> 
>
> Key: HIVE-4045
> URL: https://issues.apache.org/jira/browse/HIVE-4045
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Li Yang
>Assignee: Li Yang
>Priority: Minor
>
> MetaStorePreEventListener which implements onEvent(PreEventContext context) 
> sometimes needs to access Table properties when PreDropPartitionEvent is 
> listened to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582897#comment-13582897
 ] 

Namit Jain commented on HIVE-3741:
--

+1

> Driver.validateConfVariables() should perform more validations
> --
>
> Key: HIVE-3741
> URL: https://issues.apache.org/jira/browse/HIVE-3741
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3741.patch.1
>
>
> Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4016) Remove init(fname) from TestParse.vm for each test

2013-02-20 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4016:
--

Attachment: HIVE-4016.D8547.2.patch

navis updated the revision "HIVE-4016 [jira] Remove init(fname) from 
TestParse.vm for each test".

  Addressed commnets (removed dummy incrementors and updated result plans)

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D8547

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D8547?vs=27657&id=28263#toc

AFFECTED FILES
  ql/src/test/results/compiler/plan/case_sensitivity.q.xml
  ql/src/test/results/compiler/plan/cast1.q.xml
  ql/src/test/results/compiler/plan/groupby1.q.xml
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml
  ql/src/test/results/compiler/plan/groupby4.q.xml
  ql/src/test/results/compiler/plan/groupby5.q.xml
  ql/src/test/results/compiler/plan/groupby6.q.xml
  ql/src/test/results/compiler/plan/input1.q.xml
  ql/src/test/results/compiler/plan/input2.q.xml
  ql/src/test/results/compiler/plan/input20.q.xml
  ql/src/test/results/compiler/plan/input3.q.xml
  ql/src/test/results/compiler/plan/input4.q.xml
  ql/src/test/results/compiler/plan/input5.q.xml
  ql/src/test/results/compiler/plan/input6.q.xml
  ql/src/test/results/compiler/plan/input7.q.xml
  ql/src/test/results/compiler/plan/input8.q.xml
  ql/src/test/results/compiler/plan/input9.q.xml
  ql/src/test/results/compiler/plan/input_part1.q.xml
  ql/src/test/results/compiler/plan/input_testsequencefile.q.xml
  ql/src/test/results/compiler/plan/input_testxpath.q.xml
  ql/src/test/results/compiler/plan/input_testxpath2.q.xml
  ql/src/test/results/compiler/plan/join1.q.xml
  ql/src/test/results/compiler/plan/join2.q.xml
  ql/src/test/results/compiler/plan/join3.q.xml
  ql/src/test/results/compiler/plan/join4.q.xml
  ql/src/test/results/compiler/plan/join5.q.xml
  ql/src/test/results/compiler/plan/join6.q.xml
  ql/src/test/results/compiler/plan/join7.q.xml
  ql/src/test/results/compiler/plan/join8.q.xml
  ql/src/test/results/compiler/plan/sample1.q.xml
  ql/src/test/results/compiler/plan/sample2.q.xml
  ql/src/test/results/compiler/plan/sample3.q.xml
  ql/src/test/results/compiler/plan/sample4.q.xml
  ql/src/test/results/compiler/plan/sample5.q.xml
  ql/src/test/results/compiler/plan/sample6.q.xml
  ql/src/test/results/compiler/plan/sample7.q.xml
  ql/src/test/results/compiler/plan/subq.q.xml
  ql/src/test/results/compiler/plan/udf1.q.xml
  ql/src/test/results/compiler/plan/udf4.q.xml
  ql/src/test/results/compiler/plan/udf6.q.xml
  ql/src/test/results/compiler/plan/udf_case.q.xml
  ql/src/test/results/compiler/plan/udf_when.q.xml
  ql/src/test/results/compiler/plan/union.q.xml
  ql/src/test/templates/TestParse.vm

To: JIRA, ashutoshc, navis


> Remove init(fname) from TestParse.vm for each test
> --
>
> Key: HIVE-4016
> URL: https://issues.apache.org/jira/browse/HIVE-4016
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-4016.D8547.1.patch, HIVE-4016.D8547.2.patch
>
>
> TestParse does not change any of configuration or data, which means calling 
> init() method before each test is not necessary. After removing it, test time 
> reduced to 260sec to 16sec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2843) UDAF to convert an aggregation to a map

2013-02-20 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2843:
--

Attachment: HIVE-2843.D8745.1.patch

navis requested code review of "HIVE-2843 [jira] UDAF to convert an aggregation 
to a map".

Reviewers: JIRA

HIVE-2843 UDAF to convert an aggregation to a map

I propose the addition of two new Hive UDAF to help with maps in Apache Hive. 
The source code is available on GitHub at https://github.com/wdavidw/hive-udf 
in two Java classes: "UDAFToMap" and "UDAFToOrderedMap". The first function 
convert an aggregation into a map and is internally using a Java `HashMap`. The 
second function extends the first one. It convert an aggregation into an 
ordered map and is internally using a Java `TreeMap`. They both extends the 
`AbstractGenericUDAFResolver` class.

Also, I have covered the motivations and usages of those UDAF in a blog post at 
http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/

The full patch is available with tests as well.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D8745

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFImplodeToMap.java
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFImplodeToOrderedMap.java
  ql/src/test/queries/clientpositive/implode_to_map.q
  ql/src/test/queries/clientpositive/implode_to_ordered_map.q
  ql/src/test/results/clientpositive/implode_to_map.q.out
  ql/src/test/results/clientpositive/implode_to_ordered_map.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/21309/

To: JIRA, navis


> UDAF to convert an aggregation to a map
> ---
>
> Key: HIVE-2843
> URL: https://issues.apache.org/jira/browse/HIVE-2843
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.9.0, 0.10.0
>Reporter: David Worms
>Priority: Minor
>  Labels: features, udf
> Attachments: HIVE-2843.1.patch.txt, HIVE-2843.D8745.1.patch
>
>
> I propose the addition of two new Hive UDAF to help with maps in Apache Hive. 
> The source code is available on GitHub at https://github.com/wdavidw/hive-udf 
> in two Java classes: "UDAFToMap" and "UDAFToOrderedMap". The first function 
> convert an aggregation into a map and is internally using a Java `HashMap`. 
> The second function extends the first one. It convert an aggregation into an 
> ordered map and is internally using a Java `TreeMap`. They both extends the 
> `AbstractGenericUDAFResolver` class.
> Also, I have covered the motivations and usages of those UDAF in a blog post 
> at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/
> The full patch is available with tests as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2843) UDAF to convert an aggregation to a map

2013-02-20 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582924#comment-13582924
 ] 

Navis commented on HIVE-2843:
-

Made phabricator entry for quick review. I've used similar UDAF for 
implementing pivot feature and it was very useful.

> UDAF to convert an aggregation to a map
> ---
>
> Key: HIVE-2843
> URL: https://issues.apache.org/jira/browse/HIVE-2843
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.9.0, 0.10.0
>Reporter: David Worms
>Priority: Minor
>  Labels: features, udf
> Attachments: HIVE-2843.1.patch.txt, HIVE-2843.D8745.1.patch
>
>
> I propose the addition of two new Hive UDAF to help with maps in Apache Hive. 
> The source code is available on GitHub at https://github.com/wdavidw/hive-udf 
> in two Java classes: "UDAFToMap" and "UDAFToOrderedMap". The first function 
> convert an aggregation into a map and is internally using a Java `HashMap`. 
> The second function extends the first one. It convert an aggregation into an 
> ordered map and is internally using a Java `TreeMap`. They both extends the 
> `AbstractGenericUDAFResolver` class.
> Also, I have covered the motivations and usages of those UDAF in a blog post 
> at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/
> The full patch is available with tests as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3968) Enhance logging in TableAccessInfo

2013-02-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582929#comment-13582929
 ] 

Namit Jain commented on HIVE-3968:
--

+1

> Enhance logging in TableAccessInfo
> --
>
> Key: HIVE-3968
> URL: https://issues.apache.org/jira/browse/HIVE-3968
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, 
> HIVE-3968.3.patch.txt
>
>
> Based on what is currently available in the TableAccessInfo we can infer when 
> it would be a good idea to add bucketing/sorting metadata for tables.  
> However, we can't easily tell if we're already getting the benefits of 
> bucketing/sorting.
> This information can be improved by
> a) storing the input table/partition objects so that we can tell if the 
> tables/partitions are already bucketed/sorted
> b) running the TableAccessAnalyzer after the logical optimizer, so that we 
> can tell from the operators whether or not we are already getting benefits 
> (bucketed/sort merge map joins or map group bys)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3996:
-

Status: Open  (was: Patch Available)

comments

> Correctly enforce the memory limit on the multi-table map-join
> --
>
> Key: HIVE-3996
> URL: https://issues.apache.org/jira/browse/HIVE-3996
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-3996_2.patch, HIVE-3996_3.patch, HIVE-3996.patch
>
>
> Currently with HIVE-3784, the joins are converted to map-joins based on 
> checks of the table size against the config variable: 
> hive.auto.convert.join.noconditionaltask.size. 
> However, the current implementation will also merge multiple mapjoin 
> operators into a single task regardless of whether the sum of the table sizes 
> will exceed the configured value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener

2013-02-20 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582934#comment-13582934
 ] 

Namit Jain commented on HIVE-3970:
--

+1

> Clean up/fix PartitionNameWhitelistPreEventListener
> ---
>
> Key: HIVE-3970
> URL: https://issues.apache.org/jira/browse/HIVE-3970
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt, 
> HIVE-3970.3.patch.txt
>
>
> There are a number of issues and things which can be cleaned up related to 
> PartitionNameWhitelistPreEventListener.
> * It's an event listener, but it really doesn't need to be given that the 
> regex whitelist is configurable, it could just be a utility method.
> * It's not run when a partition is renamed, so partitions with invalid 
> characters can be created in this way.
> * There's no easy way to check if a partition contains invalid characters 
> before creating it and seeing if it fails.
> Most importantly, when a dynamic partition contains an invalid character, the 
> directory for this partition is created, and the data is moved into it, but 
> the partition fails to be created leaving an orphan directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3710:
---

Attachment: HIVE-3710.patch.2

> HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
> FileSinkOperator
> --
>
> Key: HIVE-3710
> URL: https://issues.apache.org/jira/browse/HIVE-3710
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3710.patch.1, HIVE-3710.patch.2
>
>
> It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2843) UDAF to convert an aggregation to a map

2013-02-20 Thread Phabricator (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582947#comment-13582947
 ] 

Phabricator commented on HIVE-2843:
---

njain has commented on the revision "HIVE-2843 [jira] UDAF to convert an 
aggregation to a map".

INLINE COMMENTS
  ql/src/test/queries/clientpositive/implode_to_map.q:2 The code changes look 
good.

  Some minor comments:

  1. Can you add

  describe implode_to_map and desc extended in the test ?

  2. Have you run all the tests ? I think you need to update 
show_functions.q.out
  ql/src/test/queries/clientpositive/implode_to_map.q:24 can you add some 
comments here - what is the implode_to_map returning ?

  Add a test where the 2nd arg to implode_to_map is a primitive type
  ql/src/test/queries/clientpositive/implode_to_ordered_map.q:25 same as above.

REVISION DETAIL
  https://reviews.facebook.net/D8745

To: JIRA, navis
Cc: njain


> UDAF to convert an aggregation to a map
> ---
>
> Key: HIVE-2843
> URL: https://issues.apache.org/jira/browse/HIVE-2843
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.9.0, 0.10.0
>Reporter: David Worms
>Priority: Minor
>  Labels: features, udf
> Attachments: HIVE-2843.1.patch.txt, HIVE-2843.D8745.1.patch
>
>
> I propose the addition of two new Hive UDAF to help with maps in Apache Hive. 
> The source code is available on GitHub at https://github.com/wdavidw/hive-udf 
> in two Java classes: "UDAFToMap" and "UDAFToOrderedMap". The first function 
> convert an aggregation into a map and is internally using a Java `HashMap`. 
> The second function extends the first one. It convert an aggregation into an 
> ordered map and is internally using a Java `TreeMap`. They both extends the 
> `AbstractGenericUDAFResolver` class.
> Also, I have covered the motivations and usages of those UDAF in a blog post 
> at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/
> The full patch is available with tests as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3741:
-

   Resolution: Fixed
Fix Version/s: 0.11.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Tim

> Driver.validateConfVariables() should perform more validations
> --
>
> Key: HIVE-3741
> URL: https://issues.apache.org/jira/browse/HIVE-3741
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Fix For: 0.11.0
>
> Attachments: HIVE-3741.patch.1
>
>
> Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582952#comment-13582952
 ] 

Gang Tim Liu commented on HIVE-3741:


Namit, thank you very much Tim





> Driver.validateConfVariables() should perform more validations
> --
>
> Key: HIVE-3741
> URL: https://issues.apache.org/jira/browse/HIVE-3741
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Fix For: 0.11.0
>
> Attachments: HIVE-3741.patch.1
>
>
> Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4005) Column truncation

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4005:
-

Status: Open  (was: Patch Available)

comments

> Column truncation
> -
>
> Key: HIVE-4005
> URL: https://issues.apache.org/jira/browse/HIVE-4005
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-4005.1.patch.txt, HIVE-4005.2.patch.txt, 
> HIVE-4005.3.patch.txt
>
>
> Column truncation allows users to remove data for columns that are no longer 
> useful.
> This is done by removing the data for the column and setting the length of 
> the column data and related lengths to 0 in the RC file header.
> RC file was fixed to recognize columns with lengths of zero to be empty and 
> are treated as if the column doesn't exist in the data, a null is returned 
> for every value of that column in every row. This is the same thing that 
> happens when more columns are selected than exist in the file.
> A new command was added to the CLI
> TRUNCATE TABLE ... PARTITION ... COLUMNS ...
> This launches a map only job where each mapper rewrites a single file without 
> the unnecessary column data and the adjusted headers. It does not 
> uncompress/deserialize the data so it is much faster than rewriting the data 
> with NULLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4016) Remove init(fname) from TestParse.vm for each test

2013-02-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582970#comment-13582970
 ] 

Ashutosh Chauhan commented on HIVE-4016:


+1 Running tests.

> Remove init(fname) from TestParse.vm for each test
> --
>
> Key: HIVE-4016
> URL: https://issues.apache.org/jira/browse/HIVE-4016
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-4016.D8547.1.patch, HIVE-4016.D8547.2.patch
>
>
> TestParse does not change any of configuration or data, which means calling 
> init() method before each test is not necessary. After removing it, test time 
> reduced to 260sec to 16sec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3968) Enhance logging in TableAccessInfo

2013-02-20 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3968:
-

Status: Open  (was: Patch Available)

The tests table_access_keys_stats.q and table_access_keys_stats2.q are failing


> Enhance logging in TableAccessInfo
> --
>
> Key: HIVE-3968
> URL: https://issues.apache.org/jira/browse/HIVE-3968
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, 
> HIVE-3968.3.patch.txt
>
>
> Based on what is currently available in the TableAccessInfo we can infer when 
> it would be a good idea to add bucketing/sorting metadata for tables.  
> However, we can't easily tell if we're already getting the benefits of 
> bucketing/sorting.
> This information can be improved by
> a) storing the input table/partition objects so that we can tell if the 
> tables/partitions are already bucketed/sorted
> b) running the TableAccessAnalyzer after the logical optimizer, so that we 
> can tell from the operators whether or not we are already getting benefits 
> (bucketed/sort merge map joins or map group bys)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-948) more query plan optimization rules

2013-02-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582973#comment-13582973
 ] 

Ashutosh Chauhan commented on HIVE-948:
---

Makes sense. Navis, once you update the patch (there are few more .q files 
which were added in trunk since you last updated the patch), I will get it in. 

> more query plan optimization rules 
> ---
>
> Key: HIVE-948
> URL: https://issues.apache.org/jira/browse/HIVE-948
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Navis
> Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, 
> HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.4.patch
>
>
> Many query plans are not optimal in that they contain redundant operators. 
> Some examples are unnecessary select operators (select followed by select, 
> select output being the same as input etc.). Even though these operators are 
> not very expensive, they could account for around 10% of CPU time in some 
> simple queries. It seems they are low-hanging fruits that we should pick 
> first. 
> BTW, it seems these optimization rules should be added at the last stage of 
> the physical optimization phase since some redundant operators are added to 
> facilitate physical plan generation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3710:
---

Attachment: HIVE-3710.patch.3

> HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
> FileSinkOperator
> --
>
> Key: HIVE-3710
> URL: https://issues.apache.org/jira/browse/HIVE-3710
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3710.patch.1, HIVE-3710.patch.2, HIVE-3710.patch.3
>
>
> It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3710 started by Gang Tim Liu.

> HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
> FileSinkOperator
> --
>
> Key: HIVE-3710
> URL: https://issues.apache.org/jira/browse/HIVE-3710
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3710.patch.1, HIVE-3710.patch.2, HIVE-3710.patch.3
>
>
> It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3710:
---

Status: Patch Available  (was: In Progress)

Add a new test case.

Existing stas-related test cases cover the case of 
hive.stats.collect.rawdatasize as true.

The new test case compares config is on/off in order to ensure HIVE-3710 keeps 
existing logic intact.

patch is available. both attachment and phabricator. 

> HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
> FileSinkOperator
> --
>
> Key: HIVE-3710
> URL: https://issues.apache.org/jira/browse/HIVE-3710
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
> Attachments: HIVE-3710.patch.1, HIVE-3710.patch.2, HIVE-3710.patch.3
>
>
> It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

98 matches

Mail list logo