[jira] [Updated] (HIVE-4042) ignore mapjoin hint

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4042:
-

Attachment: hive.4042.2.patch

 ignore mapjoin hint
 ---

 Key: HIVE-4042
 URL: https://issues.apache.org/jira/browse/HIVE-4042
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4042.1.patch, hive.4042.2.patch


 After HIVE-3784, in a production environment, it can become difficult to
 deploy since a lot of production queries can break.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3938) Hive MetaStore should send a single AddPartitionEvent for atomically added partition-set.

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3938:
-

Status: Open  (was: Patch Available)

Can you refresh once HIVE-4004 is in ?

 Hive MetaStore should send a single AddPartitionEvent for atomically added 
 partition-set.
 -

 Key: HIVE-3938
 URL: https://issues.apache.org/jira/browse/HIVE-3938
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-3938.patch


 HiveMetaStore::add_partitions() currently adds all partitions specified in 
 one call using a single meta-store transaction. This acts correctly. However, 
 there's one AddPartitionEvent created per partition specified.
 Ideally, the set of partitions added atomically can be communicated using a 
 single AddPartitionEvent, such that they are consumed together.
 I'll post a patch that does this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails

2013-02-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582054#comment-13582054
 ] 

Namit Jain commented on HIVE-4004:
--

+1


 Incorrect status for AddPartition metastore event if RawStore commit fails
 --

 Key: HIVE-4004
 URL: https://issues.apache.org/jira/browse/HIVE-4004
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: Dilip Joseph
Assignee: Dilip Joseph
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-4004.1.patch.txt


 For ADD PARTITION operations, the AddPartitionEvent does not care if the 
 RawStore commit succeeded or not.  This means that an AddPartitionEvent with 
 status=true is fired even if the the actual ADD PARTITION operation failed.  
 This will confuse any AddPartitionEvent listeners.
 Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the 
 status of the RawStore commit.  Only AddPartitionEvent has this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2843) UDAF to convert an aggregation to a map

2013-02-20 Thread David Worms (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582066#comment-13582066
 ] 

David Worms commented on HIVE-2843:
---

I just created the requested phabricator entry: 
https://reviews.facebook.net/T45. 

I did my best, arc wasnt working for me, a message like libphutil v1 libraries 
are no longer supported, I tried a workaround illustrated on the mailing list 
(http://mail-archives.apache.org/mod_mbox/hive-dev/201301.mbox/%3CFF1DF58D04F11D4291D09795D1A4EF1618657D12DB@SRV-MAIL%3E)
 but also without success. I ended up creating the patch and uploading it 
manually.

 UDAF to convert an aggregation to a map
 ---

 Key: HIVE-2843
 URL: https://issues.apache.org/jira/browse/HIVE-2843
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.9.0, 0.10.0
Reporter: David Worms
Priority: Minor
  Labels: features, udf
 Attachments: HIVE-2843.1.patch.txt


 I propose the addition of two new Hive UDAF to help with maps in Apache Hive. 
 The source code is available on GitHub at https://github.com/wdavidw/hive-udf 
 in two Java classes: UDAFToMap and UDAFToOrderedMap. The first function 
 convert an aggregation into a map and is internally using a Java `HashMap`. 
 The second function extends the first one. It convert an aggregation into an 
 ordered map and is internally using a Java `TreeMap`. They both extends the 
 `AbstractGenericUDAFResolver` class.
 Also, I have covered the motivations and usages of those UDAF in a blog post 
 at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/
 The full patch is available with tests as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3970:
-

Status: Open  (was: Patch Available)

Can you refresh ?
This patch is not applying cleanly anymore.

 Clean up/fix PartitionNameWhitelistPreEventListener
 ---

 Key: HIVE-3970
 URL: https://issues.apache.org/jira/browse/HIVE-3970
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt


 There are a number of issues and things which can be cleaned up related to 
 PartitionNameWhitelistPreEventListener.
 * It's an event listener, but it really doesn't need to be given that the 
 regex whitelist is configurable, it could just be a utility method.
 * It's not run when a partition is renamed, so partitions with invalid 
 characters can be created in this way.
 * There's no easy way to check if a partition contains invalid characters 
 before creating it and seeing if it fails.
 Most importantly, when a dynamic partition contains an invalid character, the 
 directory for this partition is created, and the data is moved into it, but 
 the partition fails to be created leaving an orphan directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3672) Support altering partition column type in Hive

2013-02-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582078#comment-13582078
 ] 

Namit Jain commented on HIVE-3672:
--

The patch is still not applying cleanly for me.

 Support altering partition column type in Hive
 --

 Key: HIVE-3672
 URL: https://issues.apache.org/jira/browse/HIVE-3672
 Project: Hive
  Issue Type: Improvement
  Components: CLI, SQL
Reporter: Jingwei Lu
Assignee: Jingwei Lu
  Labels: features
 Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, 
 HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, 
 HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.7.patch.txt

   Original Estimate: 72h
  Remaining Estimate: 72h

 Currently, Hive does not allow altering partition column types.  As we've 
 discouraged users from using non-string partition column types, this presents 
 a problem for users who want to change there partition columns to be strings, 
 they have to rename their table, create a new table, and copy all the data 
 over.
 To support this via the CLI, adding a command like ALTER TABLE table_name 
 PARTITION COLUMN (column_name new type);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3672) Support altering partition column type in Hive

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3672:
-

Status: Open  (was: Patch Available)

 Support altering partition column type in Hive
 --

 Key: HIVE-3672
 URL: https://issues.apache.org/jira/browse/HIVE-3672
 Project: Hive
  Issue Type: Improvement
  Components: CLI, SQL
Reporter: Jingwei Lu
Assignee: Jingwei Lu
  Labels: features
 Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, 
 HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, 
 HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.7.patch.txt

   Original Estimate: 72h
  Remaining Estimate: 72h

 Currently, Hive does not allow altering partition column types.  As we've 
 discouraged users from using non-string partition column types, this presents 
 a problem for users who want to change there partition columns to be strings, 
 they have to rename their table, create a new table, and copy all the data 
 over.
 To support this via the CLI, adding a command like ALTER TABLE table_name 
 PARTITION COLUMN (column_name new type);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.

2013-02-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582083#comment-13582083
 ] 

Namit Jain commented on HIVE-4039:
--

+1

 Hive compiler sometimes fails in semantic analysis / optimisation stage when 
 boolean variable appears in WHERE clause.
 --

 Key: HIVE-4039
 URL: https://issues.apache.org/jira/browse/HIVE-4039
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jean Xu
Assignee: Jean Xu
Priority: Minor
 Attachments: HIVE_4039.1.patch.txt


 Hive compiler fails with a NullPointerException in semantic analysis / 
 optimisation stage when a boolean variable appears in the WHERE clause in 
 some cases. A minimal query to generate this error is here:
 SELECT 1
 FROM (
 SELECT TRUE AS flag
 FROM dim_one_row:measurementsystems
 ) a
 WHERE flag;
 On the other hand, the following query is perfectly fine:
 SELECT 1
 FROM (
 SELECT TRUE AS flag
 FROM dim_one_row:measurementsystems
 ) a
 WHERE flag=TRUE;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-02-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582121#comment-13582121
 ] 

Namit Jain commented on HIVE-3874:
--

Can you fix eclipse also ?

 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, 
 HIVE-3874.D8529.2.patch, OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4004:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed. Thanks Dilip

 Incorrect status for AddPartition metastore event if RawStore commit fails
 --

 Key: HIVE-4004
 URL: https://issues.apache.org/jira/browse/HIVE-4004
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: Dilip Joseph
Assignee: Dilip Joseph
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-4004.1.patch.txt


 For ADD PARTITION operations, the AddPartitionEvent does not care if the 
 RawStore commit succeeded or not.  This means that an AddPartitionEvent with 
 status=true is fired even if the the actual ADD PARTITION operation failed.  
 This will confuse any AddPartitionEvent listeners.
 Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the 
 status of the RawStore commit.  Only AddPartitionEvent has this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4042) ignore mapjoin hint

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4042:
-

Status: Patch Available  (was: Open)

Tests passed

 ignore mapjoin hint
 ---

 Key: HIVE-4042
 URL: https://issues.apache.org/jira/browse/HIVE-4042
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4042.1.patch, hive.4042.2.patch


 After HIVE-3784, in a production environment, it can become difficult to
 deploy since a lot of production queries can break.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4042) ignore mapjoin hint

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4042:
-

Status: Patch Available  (was: Open)

Tests passed

 ignore mapjoin hint
 ---

 Key: HIVE-4042
 URL: https://issues.apache.org/jira/browse/HIVE-4042
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4042.1.patch, hive.4042.2.patch


 After HIVE-3784, in a production environment, it can become difficult to
 deploy since a lot of production queries can break.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1977 - Still Failing

2013-02-20 Thread Apache Jenkins Server
Changes for Build #1975
[namit] HIVE-4021 PostgreSQL upgrade scripts are creating column with incorrect 
name
(Jarek Jarcec Cecho via namit)

[hashutosh] HIVE-4033 : NPE at runtime while selecting virtual column after 
joining three tables on different keys (Ashutosh Chauhan)

[namit] HIVE-4029 Hive Profiler dies with NPE
(Brock Noland via namit)


Changes for Build #1976
[namit] HIVE-4023 Improve Error Logging in MetaStore
(Bhushan Mandhani via namit)

[namit] HIVE-3403 user should not specify mapjoin to perform sort-merge 
bucketed join
(Namit Jain via Ashutosh)

[namit] HIVE-4024 Derby metastore update script will fail when upgrading from 
0.9.0
to 0.10.0 (Jarek Jarcec Cecho via namit)


Changes for Build #1977



1 tests failed.
FAILED:  
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_aggregator_error_1

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.
at 
net.sf.antcontrib.logic.ForTask.doSequentialIteration(ForTask.java:259)
at net.sf.antcontrib.logic.ForTask.doToken(ForTask.java:268)
at net.sf.antcontrib.logic.ForTask.doTheTasks(ForTask.java:299)
at net.sf.antcontrib.logic.ForTask.execute(ForTask.java:244)




The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1977)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1977/ to 
view the results.

[jira] [Updated] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4039:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed. Thanks Jean

 Hive compiler sometimes fails in semantic analysis / optimisation stage when 
 boolean variable appears in WHERE clause.
 --

 Key: HIVE-4039
 URL: https://issues.apache.org/jira/browse/HIVE-4039
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jean Xu
Assignee: Jean Xu
Priority: Minor
 Attachments: HIVE_4039.1.patch.txt


 Hive compiler fails with a NullPointerException in semantic analysis / 
 optimisation stage when a boolean variable appears in the WHERE clause in 
 some cases. A minimal query to generate this error is here:
 SELECT 1
 FROM (
 SELECT TRUE AS flag
 FROM dim_one_row:measurementsystems
 ) a
 WHERE flag;
 On the other hand, the following query is perfectly fine:
 SELECT 1
 FROM (
 SELECT TRUE AS flag
 FROM dim_one_row:measurementsystems
 ) a
 WHERE flag=TRUE;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4027) Thrift alter_table api doesnt validate column type

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4027:
-

   Resolution: Fixed
Fix Version/s: 0.11.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Tim

 Thrift alter_table api doesnt validate column type
 --

 Key: HIVE-4027
 URL: https://issues.apache.org/jira/browse/HIVE-4027
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Fix For: 0.11.0

 Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3


 Thrift alter_table api doesnt validate column type so that invalid column 
 type can sneak it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4027) Thrift alter_table api doesnt validate column type

2013-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582185#comment-13582185
 ] 

Hudson commented on HIVE-4027:
--

Integrated in hive-trunk-hadoop1 #93 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/93/])
HIVE-4027 Thrift alter_table api doesnt validate column type
(Gang Tim Liu via namit) (Revision 1448138)

 Result = ABORTED
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448138
Files : 
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java


 Thrift alter_table api doesnt validate column type
 --

 Key: HIVE-4027
 URL: https://issues.apache.org/jira/browse/HIVE-4027
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Fix For: 0.11.0

 Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3


 Thrift alter_table api doesnt validate column type so that invalid column 
 type can sneak it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.

2013-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582184#comment-13582184
 ] 

Hudson commented on HIVE-4039:
--

Integrated in hive-trunk-hadoop1 #93 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/93/])
HIVE-4039 Hive compiler sometimes fails in semantic analysis / optimisation 
stage when boolean
variable appears in WHERE clause. (Jezn Xu via namit) (Revision 1448135)

 Result = ABORTED
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448135
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/test_boolean_whereclause.q
* /hive/trunk/ql/src/test/results/clientpositive/test_boolean_whereclause.q.out


 Hive compiler sometimes fails in semantic analysis / optimisation stage when 
 boolean variable appears in WHERE clause.
 --

 Key: HIVE-4039
 URL: https://issues.apache.org/jira/browse/HIVE-4039
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jean Xu
Assignee: Jean Xu
Priority: Minor
 Attachments: HIVE_4039.1.patch.txt


 Hive compiler fails with a NullPointerException in semantic analysis / 
 optimisation stage when a boolean variable appears in the WHERE clause in 
 some cases. A minimal query to generate this error is here:
 SELECT 1
 FROM (
 SELECT TRUE AS flag
 FROM dim_one_row:measurementsystems
 ) a
 WHERE flag;
 On the other hand, the following query is perfectly fine:
 SELECT 1
 FROM (
 SELECT TRUE AS flag
 FROM dim_one_row:measurementsystems
 ) a
 WHERE flag=TRUE;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails

2013-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582186#comment-13582186
 ] 

Hudson commented on HIVE-4004:
--

Integrated in hive-trunk-hadoop1 #93 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/93/])
HIVE-4004 Incorrect status for AddPartition metastore event if RawStore 
commit fails
(Dilip Joseph via namit) (Revision 1448101)

 Result = ABORTED
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448101
Files : 
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListenerOnlyOnCommit.java


 Incorrect status for AddPartition metastore event if RawStore commit fails
 --

 Key: HIVE-4004
 URL: https://issues.apache.org/jira/browse/HIVE-4004
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: Dilip Joseph
Assignee: Dilip Joseph
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-4004.1.patch.txt


 For ADD PARTITION operations, the AddPartitionEvent does not care if the 
 RawStore commit succeeded or not.  This means that an AddPartitionEvent with 
 status=true is fired even if the the actual ADD PARTITION operation failed.  
 This will confuse any AddPartitionEvent listeners.
 Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the 
 status of the RawStore commit.  Only AddPartitionEvent has this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4007) Create abstract classes for serializer and deserializer

2013-02-20 Thread Jarek Jarcec Cecho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582260#comment-13582260
 ] 

Jarek Jarcec Cecho commented on HIVE-4007:
--

+1 (non-binding)

Thank you for working on this Namit!

Jarcec

 Create abstract classes for serializer and deserializer
 ---

 Key: HIVE-4007
 URL: https://issues.apache.org/jira/browse/HIVE-4007
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4007.1.patch, hive.4007.2.patch, hive.4007.3.patch


 Currently, it is very difficult to change the Serializer/Deserializer
 interface, since all the SerDes directly implement the interface.
 Instead, we should have abstract classes for implementing these interfaces.
 In case of a interface change, only the abstract class and the relevant 
 serde needs to change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3980) Cleanup after HIVE-3403

2013-02-20 Thread Jarek Jarcec Cecho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582267#comment-13582267
 ] 

Jarek Jarcec Cecho commented on HIVE-3980:
--

+1 (non-binding)

Seems as a reasonable changes to me.

Jacec

 Cleanup after HIVE-3403
 ---

 Key: HIVE-3980
 URL: https://issues.apache.org/jira/browse/HIVE-3980
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3980.1.patch, hive.3980.2.patch


 There have been a lot of comments on HIVE-3403, which involve changing 
 variable names/function names/adding more comments/general cleanup etc.
 Since HIVE-3403 involves a lot of refactoring, it was fairly difficult to
 address the comments there, since refreshing becomes impossible. This jira
 is to track those cleanups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4027) Thrift alter_table api doesnt validate column type

2013-02-20 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582270#comment-13582270
 ] 

Gang Tim Liu commented on HIVE-4027:


Namit, thank you very much.

Sent from my iPhone 




 Thrift alter_table api doesnt validate column type
 --

 Key: HIVE-4027
 URL: https://issues.apache.org/jira/browse/HIVE-4027
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Fix For: 0.11.0

 Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3


 Thrift alter_table api doesnt validate column type so that invalid column 
 type can sneak it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3672) Support altering partition column type in Hive

2013-02-20 Thread Jingwei Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582349#comment-13582349
 ] 

Jingwei Lu commented on HIVE-3672:
--

Is there a merge conflict or unit test failure? Could you give me name of which 
test fails if it is the case? I run all my newly added test yesterday and they 
are clean. 

 Support altering partition column type in Hive
 --

 Key: HIVE-3672
 URL: https://issues.apache.org/jira/browse/HIVE-3672
 Project: Hive
  Issue Type: Improvement
  Components: CLI, SQL
Reporter: Jingwei Lu
Assignee: Jingwei Lu
  Labels: features
 Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, 
 HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, 
 HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.7.patch.txt

   Original Estimate: 72h
  Remaining Estimate: 72h

 Currently, Hive does not allow altering partition column types.  As we've 
 discouraged users from using non-string partition column types, this presents 
 a problem for users who want to change there partition columns to be strings, 
 they have to rename their table, create a new table, and copy all the data 
 over.
 To support this via the CLI, adding a command like ALTER TABLE table_name 
 PARTITION COLUMN (column_name new type);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3968) Enhance logging in TableAccessInfo

2013-02-20 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3968:


Attachment: HIVE-3968.3.patch.txt

 Enhance logging in TableAccessInfo
 --

 Key: HIVE-3968
 URL: https://issues.apache.org/jira/browse/HIVE-3968
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, 
 HIVE-3968.3.patch.txt


 Based on what is currently available in the TableAccessInfo we can infer when 
 it would be a good idea to add bucketing/sorting metadata for tables.  
 However, we can't easily tell if we're already getting the benefits of 
 bucketing/sorting.
 This information can be improved by
 a) storing the input table/partition objects so that we can tell if the 
 tables/partitions are already bucketed/sorted
 b) running the TableAccessAnalyzer after the logical optimizer, so that we 
 can tell from the operators whether or not we are already getting benefits 
 (bucketed/sort merge map joins or map group bys)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3968) Enhance logging in TableAccessInfo

2013-02-20 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3968:


Status: Patch Available  (was: Open)

 Enhance logging in TableAccessInfo
 --

 Key: HIVE-3968
 URL: https://issues.apache.org/jira/browse/HIVE-3968
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, 
 HIVE-3968.3.patch.txt


 Based on what is currently available in the TableAccessInfo we can infer when 
 it would be a good idea to add bucketing/sorting metadata for tables.  
 However, we can't easily tell if we're already getting the benefits of 
 bucketing/sorting.
 This information can be improved by
 a) storing the input table/partition objects so that we can tell if the 
 tables/partitions are already bucketed/sorted
 b) running the TableAccessAnalyzer after the logical optimizer, so that we 
 can tell from the operators whether or not we are already getting benefits 
 (bucketed/sort merge map joins or map group bys)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3968) Enhance logging in TableAccessInfo

2013-02-20 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582372#comment-13582372
 ] 

Kevin Wilfong commented on HIVE-3968:
-

Refreshed.

 Enhance logging in TableAccessInfo
 --

 Key: HIVE-3968
 URL: https://issues.apache.org/jira/browse/HIVE-3968
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, 
 HIVE-3968.3.patch.txt


 Based on what is currently available in the TableAccessInfo we can infer when 
 it would be a good idea to add bucketing/sorting metadata for tables.  
 However, we can't easily tell if we're already getting the benefits of 
 bucketing/sorting.
 This information can be improved by
 a) storing the input table/partition objects so that we can tell if the 
 tables/partitions are already bucketed/sorted
 b) running the TableAccessAnalyzer after the logical optimizer, so that we 
 can tell from the operators whether or not we are already getting benefits 
 (bucketed/sort merge map joins or map group bys)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener

2013-02-20 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3970:


Attachment: HIVE-3970.3.patch.txt

 Clean up/fix PartitionNameWhitelistPreEventListener
 ---

 Key: HIVE-3970
 URL: https://issues.apache.org/jira/browse/HIVE-3970
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt, 
 HIVE-3970.3.patch.txt


 There are a number of issues and things which can be cleaned up related to 
 PartitionNameWhitelistPreEventListener.
 * It's an event listener, but it really doesn't need to be given that the 
 regex whitelist is configurable, it could just be a utility method.
 * It's not run when a partition is renamed, so partitions with invalid 
 characters can be created in this way.
 * There's no easy way to check if a partition contains invalid characters 
 before creating it and seeing if it fails.
 Most importantly, when a dynamic partition contains an invalid character, the 
 directory for this partition is created, and the data is moved into it, but 
 the partition fails to be created leaving an orphan directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener

2013-02-20 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3970:


Status: Patch Available  (was: Open)

 Clean up/fix PartitionNameWhitelistPreEventListener
 ---

 Key: HIVE-3970
 URL: https://issues.apache.org/jira/browse/HIVE-3970
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt, 
 HIVE-3970.3.patch.txt


 There are a number of issues and things which can be cleaned up related to 
 PartitionNameWhitelistPreEventListener.
 * It's an event listener, but it really doesn't need to be given that the 
 regex whitelist is configurable, it could just be a utility method.
 * It's not run when a partition is renamed, so partitions with invalid 
 characters can be created in this way.
 * There's no easy way to check if a partition contains invalid characters 
 before creating it and seeing if it fails.
 Most importantly, when a dynamic partition contains an invalid character, the 
 directory for this partition is created, and the data is moved into it, but 
 the partition fails to be created leaving an orphan directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener

2013-02-20 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582386#comment-13582386
 ] 

Kevin Wilfong commented on HIVE-3970:
-

Refreshed

 Clean up/fix PartitionNameWhitelistPreEventListener
 ---

 Key: HIVE-3970
 URL: https://issues.apache.org/jira/browse/HIVE-3970
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt, 
 HIVE-3970.3.patch.txt


 There are a number of issues and things which can be cleaned up related to 
 PartitionNameWhitelistPreEventListener.
 * It's an event listener, but it really doesn't need to be given that the 
 regex whitelist is configurable, it could just be a utility method.
 * It's not run when a partition is renamed, so partitions with invalid 
 characters can be created in this way.
 * There's no easy way to check if a partition contains invalid characters 
 before creating it and seeing if it fails.
 Most importantly, when a dynamic partition contains an invalid character, the 
 directory for this partition is created, and the data is moved into it, but 
 the partition fails to be created leaving an orphan directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4040) fix ptf negative tests

2013-02-20 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4040.


Resolution: Fixed

Committed to branch. Thanks, Prajakta!

 fix ptf negative tests
 --

 Key: HIVE-4040
 URL: https://issues.apache.org/jira/browse/HIVE-4040
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Prajakta Kalmegh
Priority: Minor
 Attachments: HIVE-4040.1.patch.txt


 fix queries in -ve tests to match language changes. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4043) Parallel Hive Queries: Sporadic Errors of form: Error in metadata: javax.jdo.JDOFatalDataStoreException: IO Error: Connection reset

2013-02-20 Thread Andrew Tindle (JIRA)
Andrew Tindle created HIVE-4043:
---

 Summary: Parallel Hive Queries: Sporadic Errors of form: Error in 
metadata: javax.jdo.JDOFatalDataStoreException: IO Error: Connection reset
 Key: HIVE-4043
 URL: https://issues.apache.org/jira/browse/HIVE-4043
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.9.0
 Environment: O/S: RHEL 6.3
Metastore: Oracle 11gR2
Reporter: Andrew Tindle


I have a program that spawns Hive queries/processes, up to a maximum of 5, in 
parallel. When the number of queries drops below. ie the process has ended, 
another Hive query/process is initiated.

Sometimes, this program works, i.e. all 34 queries successfully process.

However, on other occasions, I get sporadic instances of the following error 
for some of the queries:

FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: IO Error: 
Connection reset
NestedThrowables:
java.sql.SQLRecoverableException: IO Error: Connection reset
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

Can anyone help in identifying/resolving why this occurs. It looks to me as if 
there is some kind of race condition/collision with the Hive Metastore, this 
being hosted in an Oracle DB on the same node as the Hadoop infrastructure 
(single node).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4044) Add URL type

2013-02-20 Thread Samuel Yuan (JIRA)
Samuel Yuan created HIVE-4044:
-

 Summary: Add URL type
 Key: HIVE-4044
 URL: https://issues.apache.org/jira/browse/HIVE-4044
 Project: Hive
  Issue Type: Improvement
Reporter: Samuel Yuan
Assignee: Samuel Yuan


Having a separate type for URLs would enable improvements in storage efficiency 
based on breaking up a URL into its components. The new type will be named 
URL and made a non-reserved keyword (see HIVE-701).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4005) Column truncation

2013-02-20 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-4005:


Attachment: HIVE-4005.3.patch.txt

 Column truncation
 -

 Key: HIVE-4005
 URL: https://issues.apache.org/jira/browse/HIVE-4005
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4005.1.patch.txt, HIVE-4005.2.patch.txt, 
 HIVE-4005.3.patch.txt


 Column truncation allows users to remove data for columns that are no longer 
 useful.
 This is done by removing the data for the column and setting the length of 
 the column data and related lengths to 0 in the RC file header.
 RC file was fixed to recognize columns with lengths of zero to be empty and 
 are treated as if the column doesn't exist in the data, a null is returned 
 for every value of that column in every row. This is the same thing that 
 happens when more columns are selected than exist in the file.
 A new command was added to the CLI
 TRUNCATE TABLE ... PARTITION ... COLUMNS ...
 This launches a map only job where each mapper rewrites a single file without 
 the unnecessary column data and the adjusted headers. It does not 
 uncompress/deserialize the data so it is much faster than rewriting the data 
 with NULLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4005) Column truncation

2013-02-20 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582413#comment-13582413
 ] 

Kevin Wilfong commented on HIVE-4005:
-

Updated

 Column truncation
 -

 Key: HIVE-4005
 URL: https://issues.apache.org/jira/browse/HIVE-4005
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4005.1.patch.txt, HIVE-4005.2.patch.txt, 
 HIVE-4005.3.patch.txt


 Column truncation allows users to remove data for columns that are no longer 
 useful.
 This is done by removing the data for the column and setting the length of 
 the column data and related lengths to 0 in the RC file header.
 RC file was fixed to recognize columns with lengths of zero to be empty and 
 are treated as if the column doesn't exist in the data, a null is returned 
 for every value of that column in every row. This is the same thing that 
 happens when more columns are selected than exist in the file.
 A new command was added to the CLI
 TRUNCATE TABLE ... PARTITION ... COLUMNS ...
 This launches a map only job where each mapper rewrites a single file without 
 the unnecessary column data and the adjusted headers. It does not 
 uncompress/deserialize the data so it is much faster than rewriting the data 
 with NULLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4005) Column truncation

2013-02-20 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-4005:


Status: Patch Available  (was: Open)

 Column truncation
 -

 Key: HIVE-4005
 URL: https://issues.apache.org/jira/browse/HIVE-4005
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4005.1.patch.txt, HIVE-4005.2.patch.txt, 
 HIVE-4005.3.patch.txt


 Column truncation allows users to remove data for columns that are no longer 
 useful.
 This is done by removing the data for the column and setting the length of 
 the column data and related lengths to 0 in the RC file header.
 RC file was fixed to recognize columns with lengths of zero to be empty and 
 are treated as if the column doesn't exist in the data, a null is returned 
 for every value of that column in every row. This is the same thing that 
 happens when more columns are selected than exist in the file.
 A new command was added to the CLI
 TRUNCATE TABLE ... PARTITION ... COLUMNS ...
 This launches a map only job where each mapper rewrites a single file without 
 the unnecessary column data and the adjusted headers. It does not 
 uncompress/deserialize the data so it is much faster than rewriting the data 
 with NULLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.

2013-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582423#comment-13582423
 ] 

Hudson commented on HIVE-4039:
--

Integrated in Hive-trunk-hadoop2 #130 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/130/])
HIVE-4039 Hive compiler sometimes fails in semantic analysis / optimisation 
stage when boolean
variable appears in WHERE clause. (Jezn Xu via namit) (Revision 1448135)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448135
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/test_boolean_whereclause.q
* /hive/trunk/ql/src/test/results/clientpositive/test_boolean_whereclause.q.out


 Hive compiler sometimes fails in semantic analysis / optimisation stage when 
 boolean variable appears in WHERE clause.
 --

 Key: HIVE-4039
 URL: https://issues.apache.org/jira/browse/HIVE-4039
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jean Xu
Assignee: Jean Xu
Priority: Minor
 Attachments: HIVE_4039.1.patch.txt


 Hive compiler fails with a NullPointerException in semantic analysis / 
 optimisation stage when a boolean variable appears in the WHERE clause in 
 some cases. A minimal query to generate this error is here:
 SELECT 1
 FROM (
 SELECT TRUE AS flag
 FROM dim_one_row:measurementsystems
 ) a
 WHERE flag;
 On the other hand, the following query is perfectly fine:
 SELECT 1
 FROM (
 SELECT TRUE AS flag
 FROM dim_one_row:measurementsystems
 ) a
 WHERE flag=TRUE;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4027) Thrift alter_table api doesnt validate column type

2013-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582424#comment-13582424
 ] 

Hudson commented on HIVE-4027:
--

Integrated in Hive-trunk-hadoop2 #130 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/130/])
HIVE-4027 Thrift alter_table api doesnt validate column type
(Gang Tim Liu via namit) (Revision 1448138)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448138
Files : 
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java


 Thrift alter_table api doesnt validate column type
 --

 Key: HIVE-4027
 URL: https://issues.apache.org/jira/browse/HIVE-4027
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Fix For: 0.11.0

 Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3


 Thrift alter_table api doesnt validate column type so that invalid column 
 type can sneak it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails

2013-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582425#comment-13582425
 ] 

Hudson commented on HIVE-4004:
--

Integrated in Hive-trunk-hadoop2 #130 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/130/])
HIVE-4004 Incorrect status for AddPartition metastore event if RawStore 
commit fails
(Dilip Joseph via namit) (Revision 1448101)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448101
Files : 
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListenerOnlyOnCommit.java


 Incorrect status for AddPartition metastore event if RawStore commit fails
 --

 Key: HIVE-4004
 URL: https://issues.apache.org/jira/browse/HIVE-4004
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: Dilip Joseph
Assignee: Dilip Joseph
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-4004.1.patch.txt


 For ADD PARTITION operations, the AddPartitionEvent does not care if the 
 RawStore commit succeeded or not.  This means that an AddPartitionEvent with 
 status=true is fired even if the the actual ADD PARTITION operation failed.  
 This will confuse any AddPartitionEvent listeners.
 Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the 
 status of the RawStore commit.  Only AddPartitionEvent has this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-hadoop2 - Build # 130 - Still Failing

2013-02-20 Thread Apache Jenkins Server
Changes for Build #98

Changes for Build #99
[kevinwilfong] HIVE-3940. Track columns accessed in each table in a query. 
(Samuel Yuan via kevinwilfong)


Changes for Build #100
[namit] HIVE-3778 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
(Gang Tim Liu via namit)


Changes for Build #101

Changes for Build #102

Changes for Build #103

Changes for Build #104
[hashutosh] HIVE-3977 : Hive 0.10 postgres schema script is broken (Johnny 
Zhang via Ashutosh Chauhan)

[hashutosh] HIVE-3932 : Hive release tarballs don't contain PostgreSQL 
metastore scripts (Mark Grover via Ashutosh Chauhan)


Changes for Build #105
[hashutosh] HIVE-3918 : Normalize more CRLF line endings (Mark Grover via 
Ashutosh Chauhan)

[namit] HIVE-3917 Support noscan operation for analyze command
(Gang Tim Liu via namit)


Changes for Build #106
[namit] HIVE-3937 Hive Profiler
(Pamela Vagata via namit)

[hashutosh] HIVE-3571 : add a way to run a small unit quickly (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-3956 : TestMetaStoreAuthorization always uses the same port 
(Navis via Ashutosh Chauhan)


Changes for Build #107

Changes for Build #108

Changes for Build #109

Changes for Build #110
[namit] HIVE-2839 Filters on outer join with mapjoin hint is not applied 
correctly
(Navis via namit)


Changes for Build #111

Changes for Build #112
[namit] HIVE-3998 Oracle metastore update script will fail when upgrading from 
0.9.0 to
0.10.0 (Jarek and Mark via namit)

[namit] HIVE-3999 Mysql metastore upgrade script will end up with different 
schema than
the full schema load (Jarek and Mark via namit)


Changes for Build #113

Changes for Build #114
[namit] HIVE-3995 PostgreSQL upgrade scripts are not valid
(Jarek and Mark via namit)


Changes for Build #115

Changes for Build #116
[namit] HIVE-4001 Add o.a.h.h.serde.Constants for backward compatibility
(Navis via namit)


Changes for Build #117

Changes for Build #118

Changes for Build #119

Changes for Build #120
[kevinwilfong] HIVE-3252. Add environment context to metastore Thrift calls. 
(Samuel Yuan via kevinwilfong)


Changes for Build #121

Changes for Build #122

Changes for Build #123

Changes for Build #124

Changes for Build #125

Changes for Build #126
[hashutosh] HIVE-4000 Hive client goes into infinite loop at 100% cpu (Owen 
Omalley via Ashutosh Chauhan)


Changes for Build #127
[namit] HIVE-4021 PostgreSQL upgrade scripts are creating column with incorrect 
name
(Jarek Jarcec Cecho via namit)

[hashutosh] HIVE-4033 : NPE at runtime while selecting virtual column after 
joining three tables on different keys (Ashutosh Chauhan)

[namit] HIVE-4029 Hive Profiler dies with NPE
(Brock Noland via namit)


Changes for Build #128
[namit] HIVE-4023 Improve Error Logging in MetaStore
(Bhushan Mandhani via namit)

[namit] HIVE-3403 user should not specify mapjoin to perform sort-merge 
bucketed join
(Namit Jain via Ashutosh)

[namit] HIVE-4024 Derby metastore update script will fail when upgrading from 
0.9.0
to 0.10.0 (Jarek Jarcec Cecho via namit)


Changes for Build #129

Changes for Build #130
[namit] HIVE-4027 Thrift alter_table api doesnt validate column type
(Gang Tim Liu via namit)

[namit] HIVE-4039 Hive compiler sometimes fails in semantic analysis / 
optimisation stage when boolean
variable appears in WHERE clause. (Jezn Xu via namit)

[namit] HIVE-4004 Incorrect status for AddPartition metastore event if RawStore 
commit fails
(Dilip Joseph via namit)




34 tests failed.
FAILED:  
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_1

Error Message:
Unexpected exception
See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get 
more logs.

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception
See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get 
more logs.
at junit.framework.Assert.fail(Assert.java:50)
at 
org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:5855)
at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_1(TestCliDriver.java:3476)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at 

[jira] [Created] (HIVE-4045) Modify PreDropPartitionEvent to pass Table parameter

2013-02-20 Thread Li Yang (JIRA)
Li Yang created HIVE-4045:
-

 Summary: Modify PreDropPartitionEvent to pass Table parameter
 Key: HIVE-4045
 URL: https://issues.apache.org/jira/browse/HIVE-4045
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Li Yang
Priority: Minor


MetaStorePreEventListener which implements onEvent(PreEventContext context) 
sometimes needs to access Table properties when PreDropPartitionEvent is 
listened to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #71

2013-02-20 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/71/

--
[...truncated 62804 lines...]
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2013-02-20 13:52:43,989 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] Execution completed successfully
[junit] Mapred Local Task Succeeded . Convert the Join into MapJoin
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/71/artifact/hive/build/service/localscratchdir/hive_2013-02-20_13-52-39_182_5672754359019075109/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/71/artifact/hive/build/service/tmp/hive_job_log_jenkins_201302201352_1636206819.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt
[junit] Copying file: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] Table default.testhivedrivertable stats: [num_partitions: 0, 
num_files: 1, num_rows: 0, total_size: 5812, raw_data_size: 0]
[junit] POSTHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/71/artifact/hive/build/service/localscratchdir/hive_2013-02-20_13-52-47_166_6920110678730188468/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/71/artifact/hive/build/service/localscratchdir/hive_2013-02-20_13-52-47_166_6920110678730188468/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/71/artifact/hive/build/service/tmp/hive_job_log_jenkins_201302201352_119204245.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] 

Re: Review Request: HIVE-3951: Allow Decimal type columns in Regex Serde

2013-02-20 Thread Jarek Cecho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9173/#review16799
---

Ship it!


Looks good to me (I'm not a committer).

- Jarek Cecho


On Jan. 31, 2013, 8:02 a.m., Mark Grover wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/9173/
 ---
 
 (Updated Jan. 31, 2013, 8:02 a.m.)
 
 
 Review request for hive.
 
 
 Description
 ---
 
 Add support for RegexSerde to support newly added Decimal type
 
 
 This addresses bug HVIE-3951.
 https://issues.apache.org/jira/browse/HVIE-3951
 
 
 Diffs
 -
 
   ql/src/test/queries/clientpositive/serde_regex.q c3254ca 
   ql/src/test/results/clientpositive/serde_regex.q.out a933538 
   serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java ae7693a 
 
 Diff: https://reviews.apache.org/r/9173/diff/
 
 
 Testing
 ---
 
 Added a client positive test
 
 
 Thanks,
 
 Mark Grover
 




[jira] [Commented] (HIVE-3951) Allow Decimal type columns in Regex Serde

2013-02-20 Thread Jarek Jarcec Cecho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582587#comment-13582587
 ] 

Jarek Jarcec Cecho commented on HIVE-3951:
--

+1 (non-binding)

 Allow Decimal type columns in Regex Serde
 -

 Key: HIVE-3951
 URL: https://issues.apache.org/jira/browse/HIVE-3951
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: Mark Grover
Assignee: Mark Grover
 Fix For: 0.11.0

 Attachments: HIVE-3951.1.patch


 Decimal type in Hive was recently added by HIVE-2693. We should allow users 
 to create tables with decimal type columns when using Regex Serde. 
 HIVE-3004 did something similar for other primitive types.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join

2013-02-20 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-3996:
-

Attachment: HIVE-3996_3.patch

Added a test case that demonstrates the issue when combining map-joins. This is 
an almost exact replica of the join32.q test with the size altered but, current 
code would generate the same plan as join32.q when the sum of the sizes of the 
tables would exceed the size configured by noConditionalTask.size.

 Correctly enforce the memory limit on the multi-table map-join
 --

 Key: HIVE-3996
 URL: https://issues.apache.org/jira/browse/HIVE-3996
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-3996_2.patch, HIVE-3996_3.patch, HIVE-3996.patch


 Currently with HIVE-3784, the joins are converted to map-joins based on 
 checks of the table size against the config variable: 
 hive.auto.convert.join.noconditionaltask.size. 
 However, the current implementation will also merge multiple mapjoin 
 operators into a single task regardless of whether the sum of the table sizes 
 will exceed the configured value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join

2013-02-20 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-3996:
-

Status: Patch Available  (was: Open)

 Correctly enforce the memory limit on the multi-table map-join
 --

 Key: HIVE-3996
 URL: https://issues.apache.org/jira/browse/HIVE-3996
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-3996_2.patch, HIVE-3996_3.patch, HIVE-3996.patch


 Currently with HIVE-3784, the joins are converted to map-joins based on 
 checks of the table size against the config variable: 
 hive.auto.convert.join.noconditionaltask.size. 
 However, the current implementation will also merge multiple mapjoin 
 operators into a single task regardless of whether the sum of the table sizes 
 will exceed the configured value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4046) Column masking

2013-02-20 Thread Samuel Yuan (JIRA)
Samuel Yuan created HIVE-4046:
-

 Summary: Column masking
 Key: HIVE-4046
 URL: https://issues.apache.org/jira/browse/HIVE-4046
 Project: Hive
  Issue Type: New Feature
  Components: CLI, Metastore, Query Processor
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan


Sometimes data in a table needs to be kept around but made inaccessible. Right 
now it is possible to offline a table or a partition, but not a specific column 
of a partition. Also, accessing an offlined table results in an error. With 
this change, it will be possible to mask a column at the partition level, 
causing all further queries to that column to return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Requests

2013-02-20 Thread kulkarni.swar...@gmail.com
Would someone have a chance to take a quick look at these review
requests[1][2].

[1] https://reviews.apache.org/r/9275/
[2] https://reviews.apache.org/r/9276/

Thanks,


On Tue, Feb 5, 2013 at 10:00 AM, kulkarni.swar...@gmail.com 
kulkarni.swar...@gmail.com wrote:

 Thanks Mark. Appreciate that. I'll take a look.


 On Mon, Feb 4, 2013 at 10:23 PM, Mark Grover 
 grover.markgro...@gmail.comwrote:

 Swarnim,
 I left some comments on  reviewboard.

 On Mon, Feb 4, 2013 at 8:00 AM, kulkarni.swar...@gmail.com 
 kulkarni.swar...@gmail.com wrote:

  Hello,
 
  I opened up two reviews for small issues, HIVE-3553[1] and
 HIVE-3725[2]. If
  you guys get a chance to review and provide feedback on it, I will
 really
  appreciate.
 
  Thanks,
 
  [1] https://reviews.apache.org/r/9275/
  [2] https://reviews.apache.org/r/9276/
 
  --
  Swarnim
 




 --
 Swarnim




-- 
Swarnim


[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2013-02-20 Thread Joey Echeverria (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582617#comment-13582617
 ] 

Joey Echeverria commented on HIVE-3528:
---

Hey Michael,

As a work around, did you try casting the null to the type of the column that 
you're inserting into? It's not ideal, but might be a workable interim solution.

-Joey

 Avro SerDe doesn't handle serializing Nullable types that require access to a 
 Schema
 

 Key: HIVE-3528
 URL: https://issues.apache.org/jira/browse/HIVE-3528
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: avro
 Fix For: 0.11.0

 Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt


 Deserialization properly handles hiding Nullable Avro types, including 
 complex types like record, map, array, etc. However, when Serialization 
 attempts to write out these types it erroneously makes use of the UNION 
 schema that contains NULL and the other type.
 This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
 Bytes.
 Here's a [review board of unit tests that express the 
 problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
 case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3911) udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is disabled.

2013-02-20 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-3911:
---

Attachment: HIVE-3911_branch10.patch

Attaching HIVE-3911_branch10.patch. This should make it consistent. I have just 
removed the queries that cause changes and fails this test.

 udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is 
 disabled.
 -

 Key: HIVE-3911
 URL: https://issues.apache.org/jira/browse/HIVE-3911
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Thiruvel Thirumoolan
 Fix For: 0.11.0

 Attachments: HIVE-3911_branch10.patch, HIVE-3911.patch


 I am running Hive10 unit tests against Hadoop 0.23.5 and 
 udaf_percentile_approx.q fails with a different value when map-side aggr is 
 disabled and only when 3rd argument to this UDAF is 100. Matches expected 
 output when map-side aggr is enabled for the same arguments.
 This test passes when hadoop.version is 1.1.1 and fails when its 0.23.x or 
 2.0.0-alpha or 2.0.2-alpha.
 [junit] 20c20
 [junit]  254.083331
 [junit] ---
 [junit]  252.77
 [junit] 47c47
 [junit]  254.083331
 [junit] ---
 [junit]  252.77
 [junit] 74c74
 [junit]  
 [23.358,254.083331,477.0625,489.54667]
 [junit] ---
 [junit]  [24.07,252.77,476.9,487.82]
 [junit] 101c101
 [junit]  
 [23.358,254.083331,477.0625,489.54667]
 [junit] ---
 [junit]  [24.07,252.77,476.9,487.82]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3911) udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is disabled.

2013-02-20 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-3911:
---

Fix Version/s: 0.10.1
 Assignee: Thiruvel Thirumoolan

 udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is 
 disabled.
 -

 Key: HIVE-3911
 URL: https://issues.apache.org/jira/browse/HIVE-3911
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 0.11.0, 0.10.1

 Attachments: HIVE-3911_branch10.patch, HIVE-3911.patch


 I am running Hive10 unit tests against Hadoop 0.23.5 and 
 udaf_percentile_approx.q fails with a different value when map-side aggr is 
 disabled and only when 3rd argument to this UDAF is 100. Matches expected 
 output when map-side aggr is enabled for the same arguments.
 This test passes when hadoop.version is 1.1.1 and fails when its 0.23.x or 
 2.0.0-alpha or 2.0.2-alpha.
 [junit] 20c20
 [junit]  254.083331
 [junit] ---
 [junit]  252.77
 [junit] 47c47
 [junit]  254.083331
 [junit] ---
 [junit]  252.77
 [junit] 74c74
 [junit]  
 [23.358,254.083331,477.0625,489.54667]
 [junit] ---
 [junit]  [24.07,252.77,476.9,487.82]
 [junit] 101c101
 [junit]  
 [23.358,254.083331,477.0625,489.54667]
 [junit] ---
 [junit]  [24.07,252.77,476.9,487.82]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582637#comment-13582637
 ] 

Gang Tim Liu commented on HIVE-3741:


https://reviews.facebook.net/D8715

 Driver.validateConfVariables() should perform more validations
 --

 Key: HIVE-3741
 URL: https://issues.apache.org/jira/browse/HIVE-3741
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu

 Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3741:
---

Attachment: HIVE-3741.patch.1

 Driver.validateConfVariables() should perform more validations
 --

 Key: HIVE-3741
 URL: https://issues.apache.org/jira/browse/HIVE-3741
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3741.patch.1


 Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3741 started by Gang Tim Liu.

 Driver.validateConfVariables() should perform more validations
 --

 Key: HIVE-3741
 URL: https://issues.apache.org/jira/browse/HIVE-3741
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3741.patch.1


 Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3741:
---

Status: Patch Available  (was: In Progress)

patch is available for review.

 Driver.validateConfVariables() should perform more validations
 --

 Key: HIVE-3741
 URL: https://issues.apache.org/jira/browse/HIVE-3741
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3741.patch.1


 Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4046) Column masking

2013-02-20 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-4046:
-

Component/s: Security
 Authorization

 Column masking
 --

 Key: HIVE-4046
 URL: https://issues.apache.org/jira/browse/HIVE-4046
 Project: Hive
  Issue Type: New Feature
  Components: Authorization, CLI, Metastore, Query Processor, Security
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan

 Sometimes data in a table needs to be kept around but made inaccessible. 
 Right now it is possible to offline a table or a partition, but not a 
 specific column of a partition. Also, accessing an offlined table results in 
 an error. With this change, it will be possible to mask a column at the 
 partition level, causing all further queries to that column to return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4046) Column masking

2013-02-20 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582648#comment-13582648
 ] 

Carl Steinbach commented on HIVE-4046:
--

I think it's possible to accomplish most of this functionality using views in 
combination with authorization.

I'm also concerned that with the proposed behavior users will have trouble 
differentiating between the case where they aren't allowed to read a column and 
the other case where they do have permission to read the column, but all of the 
values are actually NULL.

 Column masking
 --

 Key: HIVE-4046
 URL: https://issues.apache.org/jira/browse/HIVE-4046
 Project: Hive
  Issue Type: New Feature
  Components: Authorization, CLI, Metastore, Query Processor, Security
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan

 Sometimes data in a table needs to be kept around but made inaccessible. 
 Right now it is possible to offline a table or a partition, but not a 
 specific column of a partition. Also, accessing an offlined table results in 
 an error. With this change, it will be possible to mask a column at the 
 partition level, causing all further queries to that column to return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3720) Expand and standardize authorization in Hive

2013-02-20 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-3720:
-

Component/s: Security

 Expand and standardize authorization in Hive
 

 Key: HIVE-3720
 URL: https://issues.apache.org/jira/browse/HIVE-3720
 Project: Hive
  Issue Type: Improvement
  Components: Authorization, Security
Affects Versions: 0.9.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: Hive_Authorization_Functionality.pdf


 The existing implementation of authorization in Hive is not complete. 
 Additionally the existing implementation has security holes. This JIRA is an 
 umbrella JIRA  for a) extending authorization to all SQL operations and 
 direct metadata operations, and b) standardizing the authorization model and 
 its semantics to mirror that of MySQL as closely as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4022) Structs and struct fields cannot be NULL in INSERT statements

2013-02-20 Thread Michael Malak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582662#comment-13582662
 ] 

Michael Malak commented on HIVE-4022:
-

Note that there is a workaround for the case of setting STRUCT fields to NULL, 
but not for setting the whole STRUCT to a NULL.

The following workaround does work:

INSERT INT TABLE oc SELECT named_struct('a', cast(null as int), 'b', cast(null 
as int)) FROM tc;

But there is no equivalent workaround to casting the whole STRUCT to NULL, as 
noted in the first comment of https://issues.apache.org/jira/browse/HIVE-1287

 Structs and struct fields cannot be NULL in INSERT statements
 -

 Key: HIVE-4022
 URL: https://issues.apache.org/jira/browse/HIVE-4022
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Michael Malak

 Originally thought to be Avro-specific, and first noted with respect to 
 HIVE-3528 Avro SerDe doesn't handle serializing Nullable types that require 
 access to a Schema, it turns out even native Hive tables cannot store NULL 
 in a STRUCT field or for the entire STRUCT itself, at least when the NULL is 
 specified directly in the INSERT statement.
 Again, this affects both Avro-backed tables and native Hive tables.
 ***For native Hive tables:
 The following:
 echo 1,2 twovalues.csv
 hive
 CREATE TABLE tc (x INT, y INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 LOAD DATA LOCAL INPATH 'twovalues.csv' INTO TABLE tc;
 CREATE TABLE oc (z STRUCTa: int, b: int);
 INSERT INTO TABLE oc SELECT null FROM tc;
 produces the error
 FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target 
 table because column number/types are different 'oc': Cannot convert column 0 
 from void to structa:int,b:int.
 The following:
 INSERT INTO TABLE oc SELECT named_struct('a', null, 'b', null) FROM tc;
 produces the error:
 FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target 
 table because column number/types are different 'oc': Cannot convert column 0 
 from structa:void,b:void to structa:int,b:int.
 ***For Avro:
 In HIVE-3528, there is in fact a null-struct test case in line 14 of
 https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt
 The test script at
 https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q
 does indeed work.  But in that test, the query gets all of its data from a 
 test table verbatim:
 INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;
 If instead we stick in a hard-coded null for the struct directly into the 
 query, it fails:
 INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, 
 bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, 
 bytes1, fixed1 FROM test_serializer;
 with the following error:
 FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target 
 table because column number/types are different 'as_avro': Cannot convert 
 column 10 from void to structsint:int,sboolean:boolean,sstring:string.
 Note, though, that substituting a hard-coded null for string1 (and restoring 
 struct1 into the query) does work:
 INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, 
 bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, 
 bytes1, fixed1 FROM test_serializer;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2013-02-20 Thread Michael Malak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582664#comment-13582664
 ] 

Michael Malak commented on HIVE-3528:
-

As noted in the first comment from 
https://issues.apache.org/jira/browse/HIVE-1287, casting to a STRUCT is not 
currently supported.

However, I did just now try casting individual fields of a STRUCT and that 
indeed does work.

I just now added details to the JIRA that I created last week.
https://issues.apache.org/jira/browse/HIVE-4022


 Avro SerDe doesn't handle serializing Nullable types that require access to a 
 Schema
 

 Key: HIVE-3528
 URL: https://issues.apache.org/jira/browse/HIVE-3528
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: avro
 Fix For: 0.11.0

 Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt


 Deserialization properly handles hiding Nullable Avro types, including 
 complex types like record, map, array, etc. However, when Serialization 
 attempts to write out these types it erroneously makes use of the UNION 
 schema that contains NULL and the other type.
 This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
 Bytes.
 Here's a [review board of unit tests that express the 
 problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
 case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.

2013-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582676#comment-13582676
 ] 

Hudson commented on HIVE-4039:
--

Integrated in Hive-trunk-h0.21 #1978 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1978/])
HIVE-4039 Hive compiler sometimes fails in semantic analysis / optimisation 
stage when boolean
variable appears in WHERE clause. (Jezn Xu via namit) (Revision 1448135)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448135
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/test_boolean_whereclause.q
* /hive/trunk/ql/src/test/results/clientpositive/test_boolean_whereclause.q.out


 Hive compiler sometimes fails in semantic analysis / optimisation stage when 
 boolean variable appears in WHERE clause.
 --

 Key: HIVE-4039
 URL: https://issues.apache.org/jira/browse/HIVE-4039
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jean Xu
Assignee: Jean Xu
Priority: Minor
 Attachments: HIVE_4039.1.patch.txt


 Hive compiler fails with a NullPointerException in semantic analysis / 
 optimisation stage when a boolean variable appears in the WHERE clause in 
 some cases. A minimal query to generate this error is here:
 SELECT 1
 FROM (
 SELECT TRUE AS flag
 FROM dim_one_row:measurementsystems
 ) a
 WHERE flag;
 On the other hand, the following query is perfectly fine:
 SELECT 1
 FROM (
 SELECT TRUE AS flag
 FROM dim_one_row:measurementsystems
 ) a
 WHERE flag=TRUE;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4027) Thrift alter_table api doesnt validate column type

2013-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582677#comment-13582677
 ] 

Hudson commented on HIVE-4027:
--

Integrated in Hive-trunk-h0.21 #1978 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1978/])
HIVE-4027 Thrift alter_table api doesnt validate column type
(Gang Tim Liu via namit) (Revision 1448138)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448138
Files : 
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java


 Thrift alter_table api doesnt validate column type
 --

 Key: HIVE-4027
 URL: https://issues.apache.org/jira/browse/HIVE-4027
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Fix For: 0.11.0

 Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3


 Thrift alter_table api doesnt validate column type so that invalid column 
 type can sneak it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails

2013-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582678#comment-13582678
 ] 

Hudson commented on HIVE-4004:
--

Integrated in Hive-trunk-h0.21 #1978 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1978/])
HIVE-4004 Incorrect status for AddPartition metastore event if RawStore 
commit fails
(Dilip Joseph via namit) (Revision 1448101)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448101
Files : 
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListenerOnlyOnCommit.java


 Incorrect status for AddPartition metastore event if RawStore commit fails
 --

 Key: HIVE-4004
 URL: https://issues.apache.org/jira/browse/HIVE-4004
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: Dilip Joseph
Assignee: Dilip Joseph
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-4004.1.patch.txt


 For ADD PARTITION operations, the AddPartitionEvent does not care if the 
 RawStore commit succeeded or not.  This means that an AddPartitionEvent with 
 status=true is fired even if the the actual ADD PARTITION operation failed.  
 This will confuse any AddPartitionEvent listeners.
 Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the 
 status of the RawStore commit.  Only AddPartitionEvent has this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1978 - Fixed

2013-02-20 Thread Apache Jenkins Server
Changes for Build #1975
[namit] HIVE-4021 PostgreSQL upgrade scripts are creating column with incorrect 
name
(Jarek Jarcec Cecho via namit)

[hashutosh] HIVE-4033 : NPE at runtime while selecting virtual column after 
joining three tables on different keys (Ashutosh Chauhan)

[namit] HIVE-4029 Hive Profiler dies with NPE
(Brock Noland via namit)


Changes for Build #1976
[namit] HIVE-4023 Improve Error Logging in MetaStore
(Bhushan Mandhani via namit)

[namit] HIVE-3403 user should not specify mapjoin to perform sort-merge 
bucketed join
(Namit Jain via Ashutosh)

[namit] HIVE-4024 Derby metastore update script will fail when upgrading from 
0.9.0
to 0.10.0 (Jarek Jarcec Cecho via namit)


Changes for Build #1977

Changes for Build #1978
[namit] HIVE-4027 Thrift alter_table api doesnt validate column type
(Gang Tim Liu via namit)

[namit] HIVE-4039 Hive compiler sometimes fails in semantic analysis / 
optimisation stage when boolean
variable appears in WHERE clause. (Jezn Xu via namit)

[namit] HIVE-4004 Incorrect status for AddPartition metastore event if RawStore 
commit fails
(Dilip Joseph via namit)




All tests passed

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1978)

Status: Fixed

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1978/ to 
view the results.

[jira] [Work started] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3710 started by Gang Tim Liu.

 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
 FileSinkOperator
 --

 Key: HIVE-3710
 URL: https://issues.apache.org/jira/browse/HIVE-3710
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu

 It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3710:
---

Attachment: HIVE-3710.patch.1

 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
 FileSinkOperator
 --

 Key: HIVE-3710
 URL: https://issues.apache.org/jira/browse/HIVE-3710
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3710.patch.1


 It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582706#comment-13582706
 ] 

Gang Tim Liu commented on HIVE-3710:


https://reviews.facebook.net/D8721

 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
 FileSinkOperator
 --

 Key: HIVE-3710
 URL: https://issues.apache.org/jira/browse/HIVE-3710
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3710.patch.1


 It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3710:
---

Status: Patch Available  (was: In Progress)

patch is available.

 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
 FileSinkOperator
 --

 Key: HIVE-3710
 URL: https://issues.apache.org/jira/browse/HIVE-3710
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3710.patch.1


 It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3992) Hive RCFile::sync(long) does a sub-sequence linear search for sync blocks

2013-02-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-3992:
--

Assignee: Gopal V
Release Note: Rely on previous sync-points when syncing within the same 
RCFile and avoid unnecessary I/O
  Status: Patch Available  (was: Open)

Patch optimizes for rcfile splits when they are being merged in a 
CombineFileSplit instance.

 Hive RCFile::sync(long) does a sub-sequence linear search for sync blocks
 -

 Key: HIVE-3992
 URL: https://issues.apache.org/jira/browse/HIVE-3992
 Project: Hive
  Issue Type: Bug
 Environment: Ubuntu x86_64/java-1.6/hadoop-2.0.3
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-3992.patch, select-join-limit.html


 The following function does some bad I/O
 {code}
 public synchronized void sync(long position) throws IOException {
   ...
   try {
 seek(position + 4); // skip escape
 in.readFully(syncCheck);
 int syncLen = sync.length;
 for (int i = 0; in.getPos()  end; i++) {
   int j = 0;
   for (; j  syncLen; j++) {
 if (sync[j] != syncCheck[(i + j) % syncLen]) {
   break;
 }
   }
   if (j == syncLen) {
 in.seek(in.getPos() - SYNC_SIZE); // position before
 // sync
 return;
   }
   syncCheck[i % syncLen] = in.readByte();
 }
   }
 ...
 }
 {code}
 This causes a rather large number of readByte() calls which are passed onto a 
 ByteBuffer via a single byte array.
 This results in rather a large amount of CPU being burnt in a the linear 
 search for the sync pattern in the input RCFile (upto 92% for a skewed 
 example - a trivial map-join + limit 100).
 This behaviour should be avoided at best or at least replaced by a rolling 
 hash for efficient comparison, since it has a known byte-width of 16 bytes.
 Attached the stack trace from a Yourkit profile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4046) Column masking

2013-02-20 Thread Justin Boseant (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582759#comment-13582759
 ] 

Justin Boseant commented on HIVE-4046:
--

The problem with using authorization is that querying one of these columns is 
going to result in an error / failed query.  The requested functionality 
requires that we succeed the query and mask the data.

 Column masking
 --

 Key: HIVE-4046
 URL: https://issues.apache.org/jira/browse/HIVE-4046
 Project: Hive
  Issue Type: New Feature
  Components: Authorization, CLI, Metastore, Query Processor, Security
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan

 Sometimes data in a table needs to be kept around but made inaccessible. 
 Right now it is possible to offline a table or a partition, but not a 
 specific column of a partition. Also, accessing an offlined table results in 
 an error. With this change, it will be possible to mask a column at the 
 partition level, causing all further queries to that column to return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4046) Column masking

2013-02-20 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582772#comment-13582772
 ] 

Carl Steinbach commented on HIVE-4046:
--

Here's what I meant:

{code}
CREATE TABLE emp (
  name STRING,
  title STRING,
  salary INT
);

CREATE VIEW emp_masked AS
  SELECT name, title, NULL
  FROM emp;
{code}

Then use authorization to restrict access to the underlying emp table.

Regardless of which approach is used, I think it would be good to write up a 
proposal explaining the functional and implementation details before writing 
any code.

 Column masking
 --

 Key: HIVE-4046
 URL: https://issues.apache.org/jira/browse/HIVE-4046
 Project: Hive
  Issue Type: New Feature
  Components: Authorization, CLI, Metastore, Query Processor, Security
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan

 Sometimes data in a table needs to be kept around but made inaccessible. 
 Right now it is possible to offline a table or a partition, but not a 
 specific column of a partition. Also, accessing an offlined table results in 
 an error. With this change, it will be possible to mask a column at the 
 partition level, causing all further queries to that column to return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-3710:
-

Status: Open  (was: Patch Available)

bq. It should be part of the plan instead.

Why should it be part of the plan? Is this patch intended to resolve incorrect 
behavior, or is it a performance optimization, or ...?

Please add a test case.


 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
 FileSinkOperator
 --

 Key: HIVE-3710
 URL: https://issues.apache.org/jira/browse/HIVE-3710
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3710.patch.1


 It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582797#comment-13582797
 ] 

Gang Tim Liu commented on HIVE-3710:


It's follow up on HIVE-3706. should follow into performance optimization 
although not as big as HIVE-3706.

 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
 FileSinkOperator
 --

 Key: HIVE-3710
 URL: https://issues.apache.org/jira/browse/HIVE-3710
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3710.patch.1


 It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582804#comment-13582804
 ] 

Gang Tim Liu commented on HIVE-3710:


It's not new feature but moving code from run-time path to compilation path in 
order to improve performance.

Thought existing statistics-related test cases have good coverage already.

Please let me know your thoughts and if it makes sense. I will act accordingly.

thanks a lot

 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
 FileSinkOperator
 --

 Key: HIVE-3710
 URL: https://issues.apache.org/jira/browse/HIVE-3710
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3710.patch.1


 It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582809#comment-13582809
 ] 

Gang Tim Liu commented on HIVE-3710:


For example, stats0.q ... stats18.q are existing stats-related test cases.

thanks a lot

 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
 FileSinkOperator
 --

 Key: HIVE-3710
 URL: https://issues.apache.org/jira/browse/HIVE-3710
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3710.patch.1


 It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4002) Fetch task aggregation for simple group by query

2013-02-20 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4002:
--

Attachment: HIVE-4002.D8739.1.patch

navis requested code review of HIVE-4002 [jira] Fetch task aggregation for 
simple group by query.

Reviewers: JIRA

HIVE-4002 Fetch task aggregation for simple group by query

Aggregation queries with no group-by clause (for example, select count from 
src) executes final aggregation in single reduce task. But it's too small even 
for single reducer because the most of UDAF generates just single row for map 
aggregation. If final fetch task can aggregate outputs from map tasks, 
shuffling time can be removed.

This optimization transforms operator tree something like,

TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK

into

TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)

With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
min, before).

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D8739

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchAggregation.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
  ql/src/test/queries/clientpositive/fetch_aggregation.q
  ql/src/test/results/clientpositive/fetch_aggregation.q.out
  ql/src/test/results/compiler/plan/groupby1.q.xml
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml
  ql/src/test/results/compiler/plan/groupby5.q.xml
  serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/21291/

To: JIRA, navis


 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4002) Fetch task aggregation for simple group by query

2013-02-20 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4002:


Status: Patch Available  (was: Open)

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-948) more query plan optimization rules

2013-02-20 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582880#comment-13582880
 ] 

Navis commented on HIVE-948:


Ah, sorry. I'l update that.

bq. Why this needs to be last optimizer?
It's not updating infos for the SEL including colExprMap, etc. Following 
optimizers like GlobalLimitOptimizer or SimpleFetchOptimizer does not  modify 
operator tree. (Possibly update infos, but I was even thinking of removing all 
of them as a CleanupProcessor, making the plan file smaller)

bq. Also, parent should always have child's schema, isnt it?
I thought SEL(no-compute) does not have schema because it just inherits that of 
parent. I'll check it again.

bq. Shouldn't parent be selectStar either when child is select-star or parent 
itself is select-star.
I've escaped those situations before applying it like this (in the missing 
file), cause I'm not sure of it.
{code}
if (pSEL.getConf().isSelStarNoCompute()) {
  // SEL(no-compute)-SEL. never seen this condition, and removing parent is not 
safe in current graph walker
  return null;
}
{code}

 more query plan optimization rules 
 ---

 Key: HIVE-948
 URL: https://issues.apache.org/jira/browse/HIVE-948
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Navis
 Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, 
 HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch


 Many query plans are not optimal in that they contain redundant operators. 
 Some examples are unnecessary select operators (select followed by select, 
 select output being the same as input etc.). Even though these operators are 
 not very expensive, they could account for around 10% of CPU time in some 
 simple queries. It seems they are low-hanging fruits that we should pick 
 first. 
 BTW, it seems these optimization rules should be added at the last stage of 
 the physical optimization phase since some redundant operators are added to 
 facilitate physical plan generation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-948) more query plan optimization rules

2013-02-20 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-948:
-

Attachment: HIVE-948.D8463.4.patch

navis updated the revision HIVE-948 [jira] more query plan optimization rules.

  Added missing class, sorry

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D8463

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D8463?vs=27807id=28257#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java

To: JIRA, ashutoshc, navis


 more query plan optimization rules 
 ---

 Key: HIVE-948
 URL: https://issues.apache.org/jira/browse/HIVE-948
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Navis
 Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, 
 HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.4.patch


 Many query plans are not optimal in that they contain redundant operators. 
 Some examples are unnecessary select operators (select followed by select, 
 select output being the same as input etc.). Even though these operators are 
 not very expensive, they could account for around 10% of CPU time in some 
 simple queries. It seems they are low-hanging fruits that we should pick 
 first. 
 BTW, it seems these optimization rules should be added at the last stage of 
 the physical optimization phase since some redundant operators are added to 
 facilitate physical plan generation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4025) Add reflect UDF for member method invocation of column

2013-02-20 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582886#comment-13582886
 ] 

Phabricator commented on HIVE-4025:
---

navis has commented on the revision HIVE-4025 [jira] Add reflect UDF for 
member method invocation of column.

INLINE COMMENTS
  ql/src/test/results/clientpositive/udf_reflect2.q.out:312 I'll update that.

  bq. The last columns seem to be wrong:
  It's right result for TimeStamp class.

  getYear()
   * Returns a value that is the result of subtracting 1900 from the
   * year that contains or begins with the instant in time represented
   * by this codeDate/code object, as interpreted in the local
   * time zone.

  getMonth()
   * Returns a number representing the month that contains or begins
   * with the instant in time represented by this ttDate/tt object.
   * The value returned is between code0/code and code11/code,
   * with the value code0/code representing January.

  getDay()
   * Returns the day of the week represented by this date. The
   * returned value (tt0/tt = Sunday, tt1/tt = Monday,
   * tt2/tt = Tuesday, tt3/tt = Wednesday, tt4/tt =
   * Thursday, tt5/tt = Friday, tt6/tt = Saturday)
   * represents the day of the week that contains or begins with
   * the instant in time represented by this ttDate/tt object,
   * as interpreted in the local time zone.

REVISION DETAIL
  https://reviews.facebook.net/D8601

To: JIRA, navis
Cc: njain, brock


 Add reflect UDF for member method invocation of column
 --

 Key: HIVE-4025
 URL: https://issues.apache.org/jira/browse/HIVE-4025
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-4025.D8601.1.patch


 There are many useful non-static methods on type of primitive types. But 
 current reflect UDF cannot invoke those. For example,
 select reflect2(value, replace, val, VALUE) from src;
 which replaces 'val' part of value column with 'VALUE'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4045) Modify PreDropPartitionEvent to pass Table parameter

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-4045:


Assignee: Li Yang

 Modify PreDropPartitionEvent to pass Table parameter
 

 Key: HIVE-4045
 URL: https://issues.apache.org/jira/browse/HIVE-4045
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Li Yang
Assignee: Li Yang
Priority: Minor

 MetaStorePreEventListener which implements onEvent(PreEventContext context) 
 sometimes needs to access Table properties when PreDropPartitionEvent is 
 listened to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582897#comment-13582897
 ] 

Namit Jain commented on HIVE-3741:
--

+1

 Driver.validateConfVariables() should perform more validations
 --

 Key: HIVE-3741
 URL: https://issues.apache.org/jira/browse/HIVE-3741
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3741.patch.1


 Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4016) Remove init(fname) from TestParse.vm for each test

2013-02-20 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4016:
--

Attachment: HIVE-4016.D8547.2.patch

navis updated the revision HIVE-4016 [jira] Remove init(fname) from 
TestParse.vm for each test.

  Addressed commnets (removed dummy incrementors and updated result plans)

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D8547

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D8547?vs=27657id=28263#toc

AFFECTED FILES
  ql/src/test/results/compiler/plan/case_sensitivity.q.xml
  ql/src/test/results/compiler/plan/cast1.q.xml
  ql/src/test/results/compiler/plan/groupby1.q.xml
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml
  ql/src/test/results/compiler/plan/groupby4.q.xml
  ql/src/test/results/compiler/plan/groupby5.q.xml
  ql/src/test/results/compiler/plan/groupby6.q.xml
  ql/src/test/results/compiler/plan/input1.q.xml
  ql/src/test/results/compiler/plan/input2.q.xml
  ql/src/test/results/compiler/plan/input20.q.xml
  ql/src/test/results/compiler/plan/input3.q.xml
  ql/src/test/results/compiler/plan/input4.q.xml
  ql/src/test/results/compiler/plan/input5.q.xml
  ql/src/test/results/compiler/plan/input6.q.xml
  ql/src/test/results/compiler/plan/input7.q.xml
  ql/src/test/results/compiler/plan/input8.q.xml
  ql/src/test/results/compiler/plan/input9.q.xml
  ql/src/test/results/compiler/plan/input_part1.q.xml
  ql/src/test/results/compiler/plan/input_testsequencefile.q.xml
  ql/src/test/results/compiler/plan/input_testxpath.q.xml
  ql/src/test/results/compiler/plan/input_testxpath2.q.xml
  ql/src/test/results/compiler/plan/join1.q.xml
  ql/src/test/results/compiler/plan/join2.q.xml
  ql/src/test/results/compiler/plan/join3.q.xml
  ql/src/test/results/compiler/plan/join4.q.xml
  ql/src/test/results/compiler/plan/join5.q.xml
  ql/src/test/results/compiler/plan/join6.q.xml
  ql/src/test/results/compiler/plan/join7.q.xml
  ql/src/test/results/compiler/plan/join8.q.xml
  ql/src/test/results/compiler/plan/sample1.q.xml
  ql/src/test/results/compiler/plan/sample2.q.xml
  ql/src/test/results/compiler/plan/sample3.q.xml
  ql/src/test/results/compiler/plan/sample4.q.xml
  ql/src/test/results/compiler/plan/sample5.q.xml
  ql/src/test/results/compiler/plan/sample6.q.xml
  ql/src/test/results/compiler/plan/sample7.q.xml
  ql/src/test/results/compiler/plan/subq.q.xml
  ql/src/test/results/compiler/plan/udf1.q.xml
  ql/src/test/results/compiler/plan/udf4.q.xml
  ql/src/test/results/compiler/plan/udf6.q.xml
  ql/src/test/results/compiler/plan/udf_case.q.xml
  ql/src/test/results/compiler/plan/udf_when.q.xml
  ql/src/test/results/compiler/plan/union.q.xml
  ql/src/test/templates/TestParse.vm

To: JIRA, ashutoshc, navis


 Remove init(fname) from TestParse.vm for each test
 --

 Key: HIVE-4016
 URL: https://issues.apache.org/jira/browse/HIVE-4016
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-4016.D8547.1.patch, HIVE-4016.D8547.2.patch


 TestParse does not change any of configuration or data, which means calling 
 init() method before each test is not necessary. After removing it, test time 
 reduced to 260sec to 16sec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2843) UDAF to convert an aggregation to a map

2013-02-20 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2843:
--

Attachment: HIVE-2843.D8745.1.patch

navis requested code review of HIVE-2843 [jira] UDAF to convert an aggregation 
to a map.

Reviewers: JIRA

HIVE-2843 UDAF to convert an aggregation to a map

I propose the addition of two new Hive UDAF to help with maps in Apache Hive. 
The source code is available on GitHub at https://github.com/wdavidw/hive-udf 
in two Java classes: UDAFToMap and UDAFToOrderedMap. The first function 
convert an aggregation into a map and is internally using a Java `HashMap`. The 
second function extends the first one. It convert an aggregation into an 
ordered map and is internally using a Java `TreeMap`. They both extends the 
`AbstractGenericUDAFResolver` class.

Also, I have covered the motivations and usages of those UDAF in a blog post at 
http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/

The full patch is available with tests as well.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D8745

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFImplodeToMap.java
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFImplodeToOrderedMap.java
  ql/src/test/queries/clientpositive/implode_to_map.q
  ql/src/test/queries/clientpositive/implode_to_ordered_map.q
  ql/src/test/results/clientpositive/implode_to_map.q.out
  ql/src/test/results/clientpositive/implode_to_ordered_map.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/21309/

To: JIRA, navis


 UDAF to convert an aggregation to a map
 ---

 Key: HIVE-2843
 URL: https://issues.apache.org/jira/browse/HIVE-2843
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.9.0, 0.10.0
Reporter: David Worms
Priority: Minor
  Labels: features, udf
 Attachments: HIVE-2843.1.patch.txt, HIVE-2843.D8745.1.patch


 I propose the addition of two new Hive UDAF to help with maps in Apache Hive. 
 The source code is available on GitHub at https://github.com/wdavidw/hive-udf 
 in two Java classes: UDAFToMap and UDAFToOrderedMap. The first function 
 convert an aggregation into a map and is internally using a Java `HashMap`. 
 The second function extends the first one. It convert an aggregation into an 
 ordered map and is internally using a Java `TreeMap`. They both extends the 
 `AbstractGenericUDAFResolver` class.
 Also, I have covered the motivations and usages of those UDAF in a blog post 
 at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/
 The full patch is available with tests as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2843) UDAF to convert an aggregation to a map

2013-02-20 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582924#comment-13582924
 ] 

Navis commented on HIVE-2843:
-

Made phabricator entry for quick review. I've used similar UDAF for 
implementing pivot feature and it was very useful.

 UDAF to convert an aggregation to a map
 ---

 Key: HIVE-2843
 URL: https://issues.apache.org/jira/browse/HIVE-2843
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.9.0, 0.10.0
Reporter: David Worms
Priority: Minor
  Labels: features, udf
 Attachments: HIVE-2843.1.patch.txt, HIVE-2843.D8745.1.patch


 I propose the addition of two new Hive UDAF to help with maps in Apache Hive. 
 The source code is available on GitHub at https://github.com/wdavidw/hive-udf 
 in two Java classes: UDAFToMap and UDAFToOrderedMap. The first function 
 convert an aggregation into a map and is internally using a Java `HashMap`. 
 The second function extends the first one. It convert an aggregation into an 
 ordered map and is internally using a Java `TreeMap`. They both extends the 
 `AbstractGenericUDAFResolver` class.
 Also, I have covered the motivations and usages of those UDAF in a blog post 
 at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/
 The full patch is available with tests as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3968) Enhance logging in TableAccessInfo

2013-02-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582929#comment-13582929
 ] 

Namit Jain commented on HIVE-3968:
--

+1

 Enhance logging in TableAccessInfo
 --

 Key: HIVE-3968
 URL: https://issues.apache.org/jira/browse/HIVE-3968
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, 
 HIVE-3968.3.patch.txt


 Based on what is currently available in the TableAccessInfo we can infer when 
 it would be a good idea to add bucketing/sorting metadata for tables.  
 However, we can't easily tell if we're already getting the benefits of 
 bucketing/sorting.
 This information can be improved by
 a) storing the input table/partition objects so that we can tell if the 
 tables/partitions are already bucketed/sorted
 b) running the TableAccessAnalyzer after the logical optimizer, so that we 
 can tell from the operators whether or not we are already getting benefits 
 (bucketed/sort merge map joins or map group bys)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3996:
-

Status: Open  (was: Patch Available)

comments

 Correctly enforce the memory limit on the multi-table map-join
 --

 Key: HIVE-3996
 URL: https://issues.apache.org/jira/browse/HIVE-3996
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-3996_2.patch, HIVE-3996_3.patch, HIVE-3996.patch


 Currently with HIVE-3784, the joins are converted to map-joins based on 
 checks of the table size against the config variable: 
 hive.auto.convert.join.noconditionaltask.size. 
 However, the current implementation will also merge multiple mapjoin 
 operators into a single task regardless of whether the sum of the table sizes 
 will exceed the configured value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener

2013-02-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582934#comment-13582934
 ] 

Namit Jain commented on HIVE-3970:
--

+1

 Clean up/fix PartitionNameWhitelistPreEventListener
 ---

 Key: HIVE-3970
 URL: https://issues.apache.org/jira/browse/HIVE-3970
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt, 
 HIVE-3970.3.patch.txt


 There are a number of issues and things which can be cleaned up related to 
 PartitionNameWhitelistPreEventListener.
 * It's an event listener, but it really doesn't need to be given that the 
 regex whitelist is configurable, it could just be a utility method.
 * It's not run when a partition is renamed, so partitions with invalid 
 characters can be created in this way.
 * There's no easy way to check if a partition contains invalid characters 
 before creating it and seeing if it fails.
 Most importantly, when a dynamic partition contains an invalid character, the 
 directory for this partition is created, and the data is moved into it, but 
 the partition fails to be created leaving an orphan directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3710:
---

Attachment: HIVE-3710.patch.2

 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
 FileSinkOperator
 --

 Key: HIVE-3710
 URL: https://issues.apache.org/jira/browse/HIVE-3710
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3710.patch.1, HIVE-3710.patch.2


 It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2843) UDAF to convert an aggregation to a map

2013-02-20 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582947#comment-13582947
 ] 

Phabricator commented on HIVE-2843:
---

njain has commented on the revision HIVE-2843 [jira] UDAF to convert an 
aggregation to a map.

INLINE COMMENTS
  ql/src/test/queries/clientpositive/implode_to_map.q:2 The code changes look 
good.

  Some minor comments:

  1. Can you add

  describe implode_to_map and desc extended in the test ?

  2. Have you run all the tests ? I think you need to update 
show_functions.q.out
  ql/src/test/queries/clientpositive/implode_to_map.q:24 can you add some 
comments here - what is the implode_to_map returning ?

  Add a test where the 2nd arg to implode_to_map is a primitive type
  ql/src/test/queries/clientpositive/implode_to_ordered_map.q:25 same as above.

REVISION DETAIL
  https://reviews.facebook.net/D8745

To: JIRA, navis
Cc: njain


 UDAF to convert an aggregation to a map
 ---

 Key: HIVE-2843
 URL: https://issues.apache.org/jira/browse/HIVE-2843
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.9.0, 0.10.0
Reporter: David Worms
Priority: Minor
  Labels: features, udf
 Attachments: HIVE-2843.1.patch.txt, HIVE-2843.D8745.1.patch


 I propose the addition of two new Hive UDAF to help with maps in Apache Hive. 
 The source code is available on GitHub at https://github.com/wdavidw/hive-udf 
 in two Java classes: UDAFToMap and UDAFToOrderedMap. The first function 
 convert an aggregation into a map and is internally using a Java `HashMap`. 
 The second function extends the first one. It convert an aggregation into an 
 ordered map and is internally using a Java `TreeMap`. They both extends the 
 `AbstractGenericUDAFResolver` class.
 Also, I have covered the motivations and usages of those UDAF in a blog post 
 at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/
 The full patch is available with tests as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3741:
-

   Resolution: Fixed
Fix Version/s: 0.11.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Tim

 Driver.validateConfVariables() should perform more validations
 --

 Key: HIVE-3741
 URL: https://issues.apache.org/jira/browse/HIVE-3741
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Fix For: 0.11.0

 Attachments: HIVE-3741.patch.1


 Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3741) Driver.validateConfVariables() should perform more validations

2013-02-20 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582952#comment-13582952
 ] 

Gang Tim Liu commented on HIVE-3741:


Namit, thank you very much Tim





 Driver.validateConfVariables() should perform more validations
 --

 Key: HIVE-3741
 URL: https://issues.apache.org/jira/browse/HIVE-3741
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Fix For: 0.11.0

 Attachments: HIVE-3741.patch.1


 Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4005) Column truncation

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4005:
-

Status: Open  (was: Patch Available)

comments

 Column truncation
 -

 Key: HIVE-4005
 URL: https://issues.apache.org/jira/browse/HIVE-4005
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4005.1.patch.txt, HIVE-4005.2.patch.txt, 
 HIVE-4005.3.patch.txt


 Column truncation allows users to remove data for columns that are no longer 
 useful.
 This is done by removing the data for the column and setting the length of 
 the column data and related lengths to 0 in the RC file header.
 RC file was fixed to recognize columns with lengths of zero to be empty and 
 are treated as if the column doesn't exist in the data, a null is returned 
 for every value of that column in every row. This is the same thing that 
 happens when more columns are selected than exist in the file.
 A new command was added to the CLI
 TRUNCATE TABLE ... PARTITION ... COLUMNS ...
 This launches a map only job where each mapper rewrites a single file without 
 the unnecessary column data and the adjusted headers. It does not 
 uncompress/deserialize the data so it is much faster than rewriting the data 
 with NULLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4016) Remove init(fname) from TestParse.vm for each test

2013-02-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582970#comment-13582970
 ] 

Ashutosh Chauhan commented on HIVE-4016:


+1 Running tests.

 Remove init(fname) from TestParse.vm for each test
 --

 Key: HIVE-4016
 URL: https://issues.apache.org/jira/browse/HIVE-4016
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-4016.D8547.1.patch, HIVE-4016.D8547.2.patch


 TestParse does not change any of configuration or data, which means calling 
 init() method before each test is not necessary. After removing it, test time 
 reduced to 260sec to 16sec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3968) Enhance logging in TableAccessInfo

2013-02-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3968:
-

Status: Open  (was: Patch Available)

The tests table_access_keys_stats.q and table_access_keys_stats2.q are failing


 Enhance logging in TableAccessInfo
 --

 Key: HIVE-3968
 URL: https://issues.apache.org/jira/browse/HIVE-3968
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, 
 HIVE-3968.3.patch.txt


 Based on what is currently available in the TableAccessInfo we can infer when 
 it would be a good idea to add bucketing/sorting metadata for tables.  
 However, we can't easily tell if we're already getting the benefits of 
 bucketing/sorting.
 This information can be improved by
 a) storing the input table/partition objects so that we can tell if the 
 tables/partitions are already bucketed/sorted
 b) running the TableAccessAnalyzer after the logical optimizer, so that we 
 can tell from the operators whether or not we are already getting benefits 
 (bucketed/sort merge map joins or map group bys)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-948) more query plan optimization rules

2013-02-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582973#comment-13582973
 ] 

Ashutosh Chauhan commented on HIVE-948:
---

Makes sense. Navis, once you update the patch (there are few more .q files 
which were added in trunk since you last updated the patch), I will get it in. 

 more query plan optimization rules 
 ---

 Key: HIVE-948
 URL: https://issues.apache.org/jira/browse/HIVE-948
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Navis
 Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, 
 HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.4.patch


 Many query plans are not optimal in that they contain redundant operators. 
 Some examples are unnecessary select operators (select followed by select, 
 select output being the same as input etc.). Even though these operators are 
 not very expensive, they could account for around 10% of CPU time in some 
 simple queries. It seems they are low-hanging fruits that we should pick 
 first. 
 BTW, it seems these optimization rules should be added at the last stage of 
 the physical optimization phase since some redundant operators are added to 
 facilitate physical plan generation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3710:
---

Attachment: HIVE-3710.patch.3

 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
 FileSinkOperator
 --

 Key: HIVE-3710
 URL: https://issues.apache.org/jira/browse/HIVE-3710
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3710.patch.1, HIVE-3710.patch.2, HIVE-3710.patch.3


 It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3710 started by Gang Tim Liu.

 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
 FileSinkOperator
 --

 Key: HIVE-3710
 URL: https://issues.apache.org/jira/browse/HIVE-3710
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3710.patch.1, HIVE-3710.patch.2, HIVE-3710.patch.3


 It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator

2013-02-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3710:
---

Status: Patch Available  (was: In Progress)

Add a new test case.

Existing stas-related test cases cover the case of 
hive.stats.collect.rawdatasize as true.

The new test case compares config is on/off in order to ensure HIVE-3710 keeps 
existing logic intact.

patch is available. both attachment and phabricator. 

 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in 
 FileSinkOperator
 --

 Key: HIVE-3710
 URL: https://issues.apache.org/jira/browse/HIVE-3710
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu
 Attachments: HIVE-3710.patch.1, HIVE-3710.patch.2, HIVE-3710.patch.3


 It should be part of the plan instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira