[jira] [Commented] (HIVE-9020) When dropping external tables, Hive should not verify whether user has access to the data.

2020-07-29 Thread Carl Steinbach (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167492#comment-17167492
 ] 

Carl Steinbach commented on HIVE-9020:
--

Hi [~Thogek], unfortunately this patch was not committed. 

> When dropping external tables, Hive should not verify whether user has access 
> to the data. 
> ---
>
> Key: HIVE-9020
> URL: https://issues.apache.org/jira/browse/HIVE-9020
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Anant Nag
>Priority: Major
> Attachments: dropExternal.patch
>
>
> When dropping tables, hive verifies whether the user has access to the data 
> on hdfs. It fails, if user doesn't have access. It makes sense for internal 
> tables since the data has to be deleted when dropping internal tables but for 
> external tables, Hive should not check for data access. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-20225) SerDe to support Teradata Binary Format

2018-08-29 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596839#comment-16596839
 ] 

Carl Steinbach commented on HIVE-20225:
---

+1

> SerDe to support Teradata Binary Format
> ---
>
> Key: HIVE-20225
> URL: https://issues.apache.org/jira/browse/HIVE-20225
> Project: Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Reporter: Lu Li
>Assignee: Lu Li
>Priority: Major
> Attachments: HIVE-20225.1.patch, HIVE-20225.10.patch, 
> HIVE-20225.11.patch, HIVE-20225.12.patch, HIVE-20225.13.patch, 
> HIVE-20225.14-branch-2.patch, HIVE-20225.15.patch, HIVE-20225.2.patch, 
> HIVE-20225.3.patch, HIVE-20225.4.patch, HIVE-20225.5-branch-2.patch, 
> HIVE-20225.6.patch, HIVE-20225.7.patch, HIVE-20225.8.patch, HIVE-20225.9.patch
>
>
> When using TPT/BTEQ to export/import Data from Teradata, Teradata will 
> generate/require binary files based on the schema.
> A Customized SerDe is needed in order to directly read these files from Hive 
> or write these files in order to load back to TD.
> {code:java}
> CREATE EXTERNAL TABLE `TABLE1`(
> ...)
> PARTITIONED BY (
> ...)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.contrib.serde2.TeradataBinarySerde'
> STORED AS INPUTFORMAT
>  
> 'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileInputFormat'
> OUTPUTFORMAT
>  
> 'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileOutputFormat'
> LOCATION ...;
> SELECT * FROM `TABLE1`;{code}
> Problem Statement:
> Right now the fast way to export/import data from Teradata is using TPT. 
> However, the Hive could not directly utilize/generate these binary format 
> because it doesn't have a SerDe for these files.
> Result:
> Provided with the SerDe, Hive can operate upon/generate the exported Teradata 
> Binary Format file transparently



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20225) SerDe to support Teradata Binary Format

2018-08-06 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-20225:
--
Status: Open  (was: Patch Available)

Hi [~luli], please resubmit this patch after addressing the comments I left on 
RB. Thanks!

> SerDe to support Teradata Binary Format
> ---
>
> Key: HIVE-20225
> URL: https://issues.apache.org/jira/browse/HIVE-20225
> Project: Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Reporter: Lu Li
>Assignee: Lu Li
>Priority: Major
> Attachments: HIVE-20225.1.patch, HIVE-20225.2.patch, 
> HIVE-20225.3.patch, HIVE-20225.4.patch, HIVE-20225.5-branch-2.patch, 
> HIVE-20225.6.patch, HIVE-20225.7.patch
>
>
> When using TPT/BTEQ to export/import Data from Teradata, Teradata will 
> generate/require binary files based on the schema.
> A Customized SerDe is needed in order to directly read these files from Hive 
> or write these files in order to load back to TD.
> {code:java}
> CREATE EXTERNAL TABLE `TABLE1`(
> ...)
> PARTITIONED BY (
> ...)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.contrib.serde2.TeradataBinarySerde'
> STORED AS INPUTFORMAT
>  
> 'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileInputFormat'
> OUTPUTFORMAT
>  
> 'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileOutputFormat'
> LOCATION ...;
> SELECT * FROM `TABLE1`;{code}
> Problem Statement:
> Right now the fast way to export/import data from Teradata is using TPT. 
> However, the Hive could not directly utilize/generate these binary format 
> because it doesn't have a SerDe for these files.
> Result:
> Provided with the SerDe, Hive can operate upon/generate the exported Teradata 
> Binary Format file transparently



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20225) SerDe to support Teradata Binary Format

2018-07-29 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-20225:
--
Status: Open  (was: Patch Available)

Hi [~luli], automated testing caught new checkstyle, findbugs, and missing 
license header problems with the patch (see results above). Please fix these 
issues and then resubmit the patch for review. Thanks!

> SerDe to support Teradata Binary Format
> ---
>
> Key: HIVE-20225
> URL: https://issues.apache.org/jira/browse/HIVE-20225
> Project: Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Reporter: Lu Li
>Assignee: Lu Li
>Priority: Major
> Attachments: HIVE-20225.1.patch
>
>
> When using TPT/BTEQ to export/import Data from Teradata, Teradata will 
> generate/require binary files based on the schema.
> A Customized SerDe is needed in order to directly read these files from Hive 
> or write these files in order to load back to TD.
> {code:java}
> CREATE EXTERNAL TABLE `TABLE1`(
> ...)
> PARTITIONED BY (
> ...)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.contrib.serde2.TeradataBinarySerde'
> STORED AS INPUTFORMAT
>  
> 'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileInputFormat'
> OUTPUTFORMAT
>  
> 'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileOutputFormat'
> LOCATION ...;
> SELECT * FROM `TABLE1`;{code}
> Problem Statement:
> Right now the fast way to export/import data from Teradata is using TPT. 
> However, the Hive could not directly utilize/generate these binary format 
> because it doesn't have a SerDe for these files.
> Result:
> Provided with the SerDe, Hive can operate upon/generate the exported Teradata 
> Binary Format file transparently



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2018-02-05 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-15353:
--
Status: Open  (was: Patch Available)

[~erwaman], some files have moved around due to recent metastore changes, so 
this patch no longer applies. Can you please rebase and repost? Thanks.

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 1.1.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
>  * create_table
>  * alter_table
>  * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2018-01-31 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-15353:
--
Comment: was deleted

(was: +1)

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
>  * create_table
>  * alter_table
>  * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2018-01-31 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347582#comment-16347582
 ] 

Carl Steinbach commented on HIVE-15353:
---

+1

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
>  * create_table
>  * alter_table
>  * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-12746) when dropping external hive tables,hive metastore should not check the hdfs path write permission

2017-11-15 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-12746.
---
Resolution: Duplicate

Resolving this as a duplicate of  HIVE-9020

> when dropping external hive tables,hive metastore should not check the hdfs 
> path write permission
> -
>
> Key: HIVE-12746
> URL: https://issues.apache.org/jira/browse/HIVE-12746
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1
> Environment: hive1.2.1 hadoop2.6
>Reporter: wangfeng
>Priority: Critical
>  Labels: hdfspermission, metastore
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1 user1 has readonly permission on hdfs path '/user/www/seller_shop_info';
> 2 user1 create external table seller_shop_info on the hdfs path;
> 3 user1 drop the exernal table seller_shop_info
> then problem occurred!
> hive> drop table seller_shop_info;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Table metadata 
> not deleted since hdfs://argo/user/www/seller_shop_info is not writable by 
> user1)
> because when dropping external table,hive doesnot delete hdfs path,so hive 
> metastore should not check the hdfs write permission



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-9020) When dropping external tables, Hive should not verify whether user has access to the data.

2017-11-14 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-9020:
-
Status: Open  (was: Patch Available)

Changes have been requested so I'm setting the status to open.

> When dropping external tables, Hive should not verify whether user has access 
> to the data. 
> ---
>
> Key: HIVE-9020
> URL: https://issues.apache.org/jira/browse/HIVE-9020
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Anant Nag
> Attachments: dropExternal.patch
>
>
> When dropping tables, hive verifies whether the user has access to the data 
> on hdfs. It fails, if user doesn't have access. It makes sense for internal 
> tables since the data has to be deleted when dropping internal tables but for 
> external tables, Hive should not check for data access. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17530) ClassCastException when converting uniontype

2017-09-18 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-17530:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Anthony!

> ClassCastException when converting uniontype
> 
>
> Key: HIVE-17530
> URL: https://issues.apache.org/jira/browse/HIVE-17530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Fix For: 3.0.0
>
> Attachments: HIVE-17530.1.patch, HIVE-17530.2.patch
>
>
> To repro:
> {noformat}
> SET hive.exec.schema.evolution = false;
> CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
> PARTITIONED BY (b int) STORED AS ORC;
> INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
> create_union(1, true, value) FROM src LIMIT 5;
> ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;
> SELECT * FROM avro_orc_partitioned_uniontype;
> {noformat}
> The exception you get is:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.UnionObject
> {code}
> The issue is that StandardUnionObjectInspector was creating and returning an 
> ArrayList rather than a UnionObject.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17530) ClassCastException when converting uniontype

2017-09-14 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166941#comment-16166941
 ] 

Carl Steinbach commented on HIVE-17530:
---

Patch looks good. +1. Will commit if tests pass.

> ClassCastException when converting uniontype
> 
>
> Key: HIVE-17530
> URL: https://issues.apache.org/jira/browse/HIVE-17530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-17530.1.patch
>
>
> To repro:
> {noformat}
> SET hive.exec.schema.evolution = false;
> CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
> PARTITIONED BY (b int) STORED AS ORC;
> INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
> create_union(1, true, value) FROM src LIMIT 5;
> ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;
> SELECT * FROM avro_orc_partitioned_uniontype;
> {noformat}
> The exception you get is:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.UnionObject
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-17394:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Anthony!

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Fix For: 3.0.0
>
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keeps regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchemaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163704#comment-16163704
 ] 

Carl Steinbach commented on HIVE-17394:
---

The four test failures were already present in previous builds, so this looks 
like a clean run.

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keeps regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchemaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-17394:
--
Component/s: Serializers/Deserializers

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keep regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchmaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-12 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163121#comment-16163121
 ] 

Carl Steinbach commented on HIVE-17394:
---

Nice catch!

+1. Will commit if tests pass.

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Ratandeep Ratti
>Assignee: Anthony Hsu
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keep regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchmaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14261) Support set/unset partition parameters

2017-07-31 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108150#comment-16108150
 ] 

Carl Steinbach commented on HIVE-14261:
---

[~ashutoshc] will adding a server-side property that disables this 
functionality satisfy your concerns?

> Support set/unset partition parameters
> --
>
> Key: HIVE-14261
> URL: https://issues.apache.org/jira/browse/HIVE-14261
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14261.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16831) Add unit tests for NPE fixes in HIVE-12054

2017-06-13 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-16831:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Sunitha!

> Add unit tests for NPE fixes in HIVE-12054
> --
>
> Key: HIVE-16831
> URL: https://issues.apache.org/jira/browse/HIVE-16831
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Fix For: 3.0.0
>
> Attachments: HIVE-16831.1.patch, HIVE-16831.2.patch
>
>
> HIVE-12054 fixed NPE issues related to ObjectInspector which get triggered 
> when an empty ORC table/partition is read.
> This work adds tests that trigger that path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16831) Add unit tests for NPE fixes in HIVE-12054

2017-06-13 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048539#comment-16048539
 ] 

Carl Steinbach commented on HIVE-16831:
---

+1

> Add unit tests for NPE fixes in HIVE-12054
> --
>
> Key: HIVE-16831
> URL: https://issues.apache.org/jira/browse/HIVE-16831
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16831.1.patch, HIVE-16831.2.patch
>
>
> HIVE-12054 fixed NPE issues related to ObjectInspector which get triggered 
> when an empty ORC table/partition is read.
> This work adds tests that trigger that path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16844) Fix Connection leak in ObjectStore when new Conf object is used

2017-06-13 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-16844:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Sunitha!

> Fix Connection leak in ObjectStore when new Conf object is used
> ---
>
> Key: HIVE-16844
> URL: https://issues.apache.org/jira/browse/HIVE-16844
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Fix For: 3.0.0
>
> Attachments: HIVE-16844.1.patch
>
>
> The code path in ObjectStore.java currently leaks BoneCP (or Hikari) 
> connection pools when a new configuration object is passed in. The code needs 
> to ensure that the persistence-factory is closed before it is nullified.
> The relevant code is 
> [here|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L290].
>  Note that pmf is set to null, but the underlying connection pool is not 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16844) Fix Connection leak in ObjectStore when new Conf object is used

2017-06-13 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048533#comment-16048533
 ] 

Carl Steinbach commented on HIVE-16844:
---

+1


> Fix Connection leak in ObjectStore when new Conf object is used
> ---
>
> Key: HIVE-16844
> URL: https://issues.apache.org/jira/browse/HIVE-16844
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16844.1.patch
>
>
> The code path in ObjectStore.java currently leaks BoneCP (or Hikari) 
> connection pools when a new configuration object is passed in. The code needs 
> to ensure that the persistence-factory is closed before it is nullified.
> The relevant code is 
> [here|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L290].
>  Note that pmf is set to null, but the underlying connection pool is not 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15229) 'like any' and 'like all' operators in hive

2017-05-03 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-15229:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Simanchal!

> 'like any' and 'like all' operators in hive
> ---
>
> Key: HIVE-15229
> URL: https://issues.apache.org/jira/browse/HIVE-15229
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-15229.1.patch, HIVE-15229.2.patch, 
> HIVE-15229.3.patch, HIVE-15229.4.patch, HIVE-15229.5.patch, HIVE-15229.6.patch
>
>
> In Teradata 'like any' and 'like all' operators are mostly used when we are 
> matching a text field with numbers of patterns.
> 'like any' and 'like all' operator are equivalents of multiple like operator 
> like example below.
> {noformat}
> --like any
> select col1 from table1 where col2 like any ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like condition 
> select col1 from table1 where col2 like '%accountant%' or col2 like 
> '%accounting%' or col2 like '%retail%' or col2 like '%bank%' or col2 like 
> '%insurance%' ;
> --like all
> select col1 from table1 where col2 like all ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like operator 
> select col1 from table1 where col2 like '%accountant%' and col2 like 
> '%accounting%' and col2 like '%retail%' and col2 like '%bank%' and col2 like 
> '%insurance%' ;
> {noformat}
> Problem statement:
> Now a days so many data warehouse projects are being migrated from Teradata 
> to Hive.
> Always Data engineer and Business analyst are searching for these two 
> operator.
> If we introduce these two operator in hive then so many scripts will be 
> migrated smoothly instead of converting these operators to multiple like 
> operators.
> Result:
> 1. 'LIKE ANY' operator return true if a text(column value) matches to any 
> pattern.
> 2. 'LIKE ALL' operator return true if a text(column value) matches to all 
> patterns.
> 3. 'LIKE ANY' and 'LIKE ALL' returns NULL not only if the expression on the 
> left hand side is NULL, but also if one of the pattern in the list is NULL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15229) 'like any' and 'like all' operators in hive

2017-04-13 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968019#comment-15968019
 ] 

Carl Steinbach commented on HIVE-15229:
---

[~simanchal], the change to HiveParser no longer applies cleanly. Can you 
please refresh the patch? Thanks.

> 'like any' and 'like all' operators in hive
> ---
>
> Key: HIVE-15229
> URL: https://issues.apache.org/jira/browse/HIVE-15229
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>Priority: Minor
> Attachments: HIVE-15229.1.patch, HIVE-15229.2.patch, 
> HIVE-15229.3.patch, HIVE-15229.4.patch, HIVE-15229.5.patch
>
>
> In Teradata 'like any' and 'like all' operators are mostly used when we are 
> matching a text field with numbers of patterns.
> 'like any' and 'like all' operator are equivalents of multiple like operator 
> like example below.
> {noformat}
> --like any
> select col1 from table1 where col2 like any ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like condition 
> select col1 from table1 where col2 like '%accountant%' or col2 like 
> '%accounting%' or col2 like '%retail%' or col2 like '%bank%' or col2 like 
> '%insurance%' ;
> --like all
> select col1 from table1 where col2 like all ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like operator 
> select col1 from table1 where col2 like '%accountant%' and col2 like 
> '%accounting%' and col2 like '%retail%' and col2 like '%bank%' and col2 like 
> '%insurance%' ;
> {noformat}
> Problem statement:
> Now a days so many data warehouse projects are being migrated from Teradata 
> to Hive.
> Always Data engineer and Business analyst are searching for these two 
> operator.
> If we introduce these two operator in hive then so many scripts will be 
> migrated smoothly instead of converting these operators to multiple like 
> operators.
> Result:
> 1. 'LIKE ANY' operator return true if a text(column value) matches to any 
> pattern.
> 2. 'LIKE ALL' operator return true if a text(column value) matches to all 
> patterns.
> 3. 'LIKE ANY' and 'LIKE ALL' returns NULL not only if the expression on the 
> left hand side is NULL, but also if one of the pattern in the list is NULL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15229) 'like any' and 'like all' operators in hive

2017-04-13 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968009#comment-15968009
 ] 

Carl Steinbach commented on HIVE-15229:
---

+1

> 'like any' and 'like all' operators in hive
> ---
>
> Key: HIVE-15229
> URL: https://issues.apache.org/jira/browse/HIVE-15229
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>Priority: Minor
> Attachments: HIVE-15229.1.patch, HIVE-15229.2.patch, 
> HIVE-15229.3.patch, HIVE-15229.4.patch, HIVE-15229.5.patch
>
>
> In Teradata 'like any' and 'like all' operators are mostly used when we are 
> matching a text field with numbers of patterns.
> 'like any' and 'like all' operator are equivalents of multiple like operator 
> like example below.
> {noformat}
> --like any
> select col1 from table1 where col2 like any ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like condition 
> select col1 from table1 where col2 like '%accountant%' or col2 like 
> '%accounting%' or col2 like '%retail%' or col2 like '%bank%' or col2 like 
> '%insurance%' ;
> --like all
> select col1 from table1 where col2 like all ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like operator 
> select col1 from table1 where col2 like '%accountant%' and col2 like 
> '%accounting%' and col2 like '%retail%' and col2 like '%bank%' and col2 like 
> '%insurance%' ;
> {noformat}
> Problem statement:
> Now a days so many data warehouse projects are being migrated from Teradata 
> to Hive.
> Always Data engineer and Business analyst are searching for these two 
> operator.
> If we introduce these two operator in hive then so many scripts will be 
> migrated smoothly instead of converting these operators to multiple like 
> operators.
> Result:
> 1. 'LIKE ANY' operator return true if a text(column value) matches to any 
> pattern.
> 2. 'LIKE ALL' operator return true if a text(column value) matches to all 
> patterns.
> 3. 'LIKE ANY' and 'LIKE ALL' returns NULL not only if the expression on the 
> left hand side is NULL, but also if one of the pattern in the list is NULL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16393) Fix visibility of CodahaleReporter interface

2017-04-09 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-16393:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Sunitha!

> Fix visibility of CodahaleReporter interface
> 
>
> Key: HIVE-16393
> URL: https://issues.apache.org/jira/browse/HIVE-16393
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Fix For: 3.0.0
>
> Attachments: HIVE-16393.1.patch
>
>
> CodahaleReporter interface, introduced via Hive-16206 has package-private 
> visibility. This prevents external libraries from extending it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16393) Fix visibility of CodahaleReporter interface

2017-04-05 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958191#comment-15958191
 ] 

Carl Steinbach commented on HIVE-16393:
---

+1. Will commit if tests pass.

> Fix visibility of CodahaleReporter interface
> 
>
> Key: HIVE-16393
> URL: https://issues.apache.org/jira/browse/HIVE-16393
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16393.1.patch
>
>
> CodahaleReporter interface, introduced via Hive-16206 has package-private 
> visibility. This prevents external libraries from extending it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16393) Fix visibility of CodahaleReporter interface

2017-04-05 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958189#comment-15958189
 ] 

Carl Steinbach commented on HIVE-16393:
---

RB: https://reviews.apache.org/r/58227/


> Fix visibility of CodahaleReporter interface
> 
>
> Key: HIVE-16393
> URL: https://issues.apache.org/jira/browse/HIVE-16393
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16393.1.patch
>
>
> CodahaleReporter interface, introduced via Hive-16206 has package-private 
> visibility. This prevents external libraries from extending it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16206) Make Codahale metrics reporters pluggable

2017-04-03 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-16206:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Sunitha!

> Make Codahale metrics reporters pluggable
> -
>
> Key: HIVE-16206
> URL: https://issues.apache.org/jira/browse/HIVE-16206
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.2
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Fix For: 3.0.0
>
> Attachments: HIVE-16206.2.patch, HIVE-16206.3.patch, 
> HIVE-16206.4.patch, HIVE-16206.5.patch, HIVE-16206.6.patch, 
> HIVE-16206.7.patch, HIVE-16206.patch
>
>
> Hive metrics code currently allows pluggable metrics handlers - ie, handlers 
> that take care of providing interfaces for metrics collection as well as a 
> reporting; one of the 'handlers' is CodahaleMetrics. Codahale can work with 
> different reporters - currently supported ones are Console, JMX, JSON file 
> and hadoop2 sink. However, adding a new reporter involves changing that 
> class. We would like to make this conf driven just the way MetricsFactory 
> handles configurable Metrics classes.
> Scope of work:
> - Provide a new configuration option, HIVE_CODAHALE_REPORTER_CLASSES that 
> enumerates classes (like HIVE_METRICS_CLASS and unlike HIVE_METRICS_REPORTER).
> - Move JsonFileReporter into its own class.
> - Update CodahaleMetrics.java to read new config option and if the new option 
> is not present, look for the old option and instantiate accordingly) - ie, 
> make the code backward compatible.
> - Update and add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16206) Make Codahale metrics reporters pluggable

2017-03-30 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949727#comment-15949727
 ] 

Carl Steinbach commented on HIVE-16206:
---

+1.

> Make Codahale metrics reporters pluggable
> -
>
> Key: HIVE-16206
> URL: https://issues.apache.org/jira/browse/HIVE-16206
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.2
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16206.2.patch, HIVE-16206.3.patch, 
> HIVE-16206.4.patch, HIVE-16206.5.patch, HIVE-16206.6.patch, HIVE-16206.patch
>
>
> Hive metrics code currently allows pluggable metrics handlers - ie, handlers 
> that take care of providing interfaces for metrics collection as well as a 
> reporting; one of the 'handlers' is CodahaleMetrics. Codahale can work with 
> different reporters - currently supported ones are Console, JMX, JSON file 
> and hadoop2 sink. However, adding a new reporter involves changing that 
> class. We would like to make this conf driven just the way MetricsFactory 
> handles configurable Metrics classes.
> Scope of work:
> - Provide a new configuration option, HIVE_CODAHALE_REPORTER_CLASSES that 
> enumerates classes (like HIVE_METRICS_CLASS and unlike HIVE_METRICS_REPORTER).
> - Move JsonFileReporter into its own class.
> - Update CodahaleMetrics.java to read new config option and if the new option 
> is not present, look for the old option and instantiate accordingly) - ie, 
> make the code backward compatible.
> - Update and add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16206) Make Codahale metrics reporters pluggable

2017-03-21 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-16206:
--
Status: Open  (was: Patch Available)

Hi [~sbeeram], I left some more feedback on RB. Please take a look. Thanks.

> Make Codahale metrics reporters pluggable
> -
>
> Key: HIVE-16206
> URL: https://issues.apache.org/jira/browse/HIVE-16206
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.2
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16206.2.patch, HIVE-16206.patch
>
>
> Hive metrics code currently allows pluggable metrics handlers - ie, handlers 
> that take care of providing interfaces for metrics collection as well as a 
> reporting; one of the 'handlers' is CodahaleMetrics. Codahale can work with 
> different reporters - currently supported ones are Console, JMX, JSON file 
> and hadoop2 sink. However, adding a new reporter involves changing that 
> class. We would like to make this conf driven just the way MetricsFactory 
> handles configurable Metrics classes.
> Scope of work:
> - Provide a new configuration option, HIVE_CODAHALE_REPORTER_CLASSES that 
> enumerates classes (like HIVE_METRICS_CLASS and unlike HIVE_METRICS_REPORTER).
> - Move JsonFileReporter into its own class.
> - Update CodahaleMetrics.java to read new config option and if the new option 
> is not present, look for the old option and instantiate accordingly) - ie, 
> make the code backward compatible.
> - Update and add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16206) Make Codahale metrics reporters pluggable

2017-03-17 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-16206:
--
Status: Open  (was: Patch Available)

Please address the feedback on RB and then repost the patch. Thanks.

> Make Codahale metrics reporters pluggable
> -
>
> Key: HIVE-16206
> URL: https://issues.apache.org/jira/browse/HIVE-16206
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.2
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16206.patch
>
>
> Hive metrics code currently allows pluggable metrics handlers - ie, handlers 
> that take care of providing interfaces for metrics collection as well as a 
> reporting; one of the 'handlers' is CodahaleMetrics. Codahale can work with 
> different reporters - currently supported ones are Console, JMX, JSON file 
> and hadoop2 sink. However, adding a new reporter involves changing that 
> class. We would like to make this conf driven just the way MetricsFactory 
> handles configurable Metrics classes.
> Scope of work:
> - Provide a new configuration option, HIVE_CODAHALE_REPORTER_CLASSES that 
> enumerates classes (like HIVE_METRICS_CLASS and unlike HIVE_METRICS_REPORTER).
> - Move JsonFileReporter into its own class.
> - Update CodahaleMetrics.java to read new config option and if the new option 
> is not present, look for the old option and instantiate accordingly) - ie, 
> make the code backward compatible.
> - Update and add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16206) Make Codahale metrics reporters pluggable

2017-03-17 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930772#comment-15930772
 ] 

Carl Steinbach commented on HIVE-16206:
---

Hi [~sbeeram], I think the pre-commit patch testing job may have skipped this 
ticket (I don't see it listed in the queue 
[here|https://builds.apache.org/job/PreCommit-HIVE-Build/]). Can you please 
attach another copy of the patch following the directions 
[here|https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing]?
 Thanks.

> Make Codahale metrics reporters pluggable
> -
>
> Key: HIVE-16206
> URL: https://issues.apache.org/jira/browse/HIVE-16206
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.2
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16206.patch
>
>
> Hive metrics code currently allows pluggable metrics handlers - ie, handlers 
> that take care of providing interfaces for metrics collection as well as a 
> reporting; one of the 'handlers' is CodahaleMetrics. Codahale can work with 
> different reporters - currently supported ones are Console, JMX, JSON file 
> and hadoop2 sink. However, adding a new reporter involves changing that 
> class. We would like to make this conf driven just the way MetricsFactory 
> handles configurable Metrics classes.
> Scope of work:
> - Provide a new configuration option, HIVE_CODAHALE_REPORTER_CLASSES that 
> enumerates classes (like HIVE_METRICS_CLASS and unlike HIVE_METRICS_REPORTER).
> - Move JsonFileReporter into its own class.
> - Update CodahaleMetrics.java to read new config option and if the new option 
> is not present, look for the old option and instantiate accordingly) - ie, 
> make the code backward compatible.
> - Update and add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16242) Run BeeLine tests parallel

2017-03-17 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930517#comment-15930517
 ] 

Carl Steinbach commented on HIVE-16242:
---

The original HiveServer2 patch included support for this. I believe it was 
removed at a later point (not sure why). You can probably resurrect most of the 
code by looking at the Git history.

> Run BeeLine tests parallel
> --
>
> Key: HIVE-16242
> URL: https://issues.apache.org/jira/browse/HIVE-16242
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> Provide the ability for BeeLine tests to run parallel against the MiniHS2 
> cluster



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14764) Enabling "hive.metastore.metrics.enabled" throws OOM in HiveMetastore

2017-02-13 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-14764:
--
Labels: JMX Metrics Monitoring  (was: )

> Enabling "hive.metastore.metrics.enabled" throws OOM in HiveMetastore
> -
>
> Key: HIVE-14764
> URL: https://issues.apache.org/jira/browse/HIVE-14764
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Carl Steinbach
>Priority: Minor
>  Labels: JMX, Metrics, Monitoring
> Fix For: 2.2.0, 2.1.1
>
> Attachments: hd_1.png, hd_2.png, HIVE-14764.1.patch
>
>
> After running some queries with metrics enabled, metastore starts throwing 
> the following messages.
> {noformat}
> Caused by: java.sql.SQLException: java.lang.OutOfMemoryError: GC overhead 
> limit exceeded
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:929)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:433)
> at 
> com.mysql.jdbc.PreparedStatement.getInstance(PreparedStatement.java:877)
> at 
> com.mysql.jdbc.ConnectionImpl.clientPrepareStatement(ConnectionImpl.java:1489)
> at 
> com.mysql.jdbc.ConnectionImpl.prepareStatement(ConnectionImpl.java:4343)
> at 
> com.mysql.jdbc.ConnectionImpl.prepareStatement(ConnectionImpl.java:4242)
> at 
> com.jolbox.bonecp.ConnectionHandle.prepareStatement(ConnectionHandle.java:1024)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:350)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:295)
> at 
> org.datanucleus.store.rdbms.scostore.JoinListStore.listIterator(JoinListStore.java:761)
> ... 36 more
> Nested Throwables StackTrace:
> java.sql.SQLException: java.lang.OutOfMemoryError: GC overhead limit exceeded
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:929)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:433)
> at 
> com.mysql.jdbc.PreparedStatement.getInstance(PreparedStatement.java:877)
> at 
> com.mysql.jdbc.ConnectionImpl.clientPrepareStatement(ConnectionImpl.java:1489)
> at 
> com.mysql.jdbc.ConnectionImpl.prepareStatement(ConnectionImpl.java:4343)
> at 
> com.mysql.jdbc.ConnectionImpl.prepareStatement(ConnectionImpl.java:4242)
> at 
> com.jolbox.bonecp.ConnectionHandle.prepareStatement(ConnectionHandle.java:1024)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:350)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:295)
> at 
> org.datanucleus.store.rdbms.scostore.JoinListStore.listIterator(JoinListStore.java:761)
> at 
> org.datanucleus.store.rdbms.scostore.AbstractListStore.listIterator(AbstractListStore.java:93)
> at 
> org.datanucleus.store.rdbms.scostore.AbstractListStore.iterator(AbstractListStore.java:83)
> at 
> org.datanucleus.store.types.wrappers.backed.List.loadFromStore(List.java:264)
> at 
> org.datanucleus.store.types.wrappers.backed.List.iterator(List.java:492)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToFieldSchemas(ObjectStore.java:1199)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1266)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1281)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToTable(ObjectStore.java:1138)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.ensureGetTable(ObjectStore.java:2651)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatistics(ObjectStore.java:6141)
> {noformat}
> HiveMetastore uses start/end functions for starting/ending the scope in 
> MetricsFactory. In some places in HiveMetastore the function names are not 
> matching causing gradual memory leak in metastore with metrics enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-14764) Enabling "hive.metastore.metrics.enabled" throws OOM in HiveMetastore

2017-02-13 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reassigned HIVE-14764:
-

Assignee: Carl Steinbach  (was: Rajesh Balamohan)

> Enabling "hive.metastore.metrics.enabled" throws OOM in HiveMetastore
> -
>
> Key: HIVE-14764
> URL: https://issues.apache.org/jira/browse/HIVE-14764
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Carl Steinbach
>Priority: Minor
>  Labels: JMX, Metrics, Monitoring
> Fix For: 2.2.0, 2.1.1
>
> Attachments: hd_1.png, hd_2.png, HIVE-14764.1.patch
>
>
> After running some queries with metrics enabled, metastore starts throwing 
> the following messages.
> {noformat}
> Caused by: java.sql.SQLException: java.lang.OutOfMemoryError: GC overhead 
> limit exceeded
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:929)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:433)
> at 
> com.mysql.jdbc.PreparedStatement.getInstance(PreparedStatement.java:877)
> at 
> com.mysql.jdbc.ConnectionImpl.clientPrepareStatement(ConnectionImpl.java:1489)
> at 
> com.mysql.jdbc.ConnectionImpl.prepareStatement(ConnectionImpl.java:4343)
> at 
> com.mysql.jdbc.ConnectionImpl.prepareStatement(ConnectionImpl.java:4242)
> at 
> com.jolbox.bonecp.ConnectionHandle.prepareStatement(ConnectionHandle.java:1024)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:350)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:295)
> at 
> org.datanucleus.store.rdbms.scostore.JoinListStore.listIterator(JoinListStore.java:761)
> ... 36 more
> Nested Throwables StackTrace:
> java.sql.SQLException: java.lang.OutOfMemoryError: GC overhead limit exceeded
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:929)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:433)
> at 
> com.mysql.jdbc.PreparedStatement.getInstance(PreparedStatement.java:877)
> at 
> com.mysql.jdbc.ConnectionImpl.clientPrepareStatement(ConnectionImpl.java:1489)
> at 
> com.mysql.jdbc.ConnectionImpl.prepareStatement(ConnectionImpl.java:4343)
> at 
> com.mysql.jdbc.ConnectionImpl.prepareStatement(ConnectionImpl.java:4242)
> at 
> com.jolbox.bonecp.ConnectionHandle.prepareStatement(ConnectionHandle.java:1024)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:350)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:295)
> at 
> org.datanucleus.store.rdbms.scostore.JoinListStore.listIterator(JoinListStore.java:761)
> at 
> org.datanucleus.store.rdbms.scostore.AbstractListStore.listIterator(AbstractListStore.java:93)
> at 
> org.datanucleus.store.rdbms.scostore.AbstractListStore.iterator(AbstractListStore.java:83)
> at 
> org.datanucleus.store.types.wrappers.backed.List.loadFromStore(List.java:264)
> at 
> org.datanucleus.store.types.wrappers.backed.List.iterator(List.java:492)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToFieldSchemas(ObjectStore.java:1199)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1266)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1281)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToTable(ObjectStore.java:1138)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.ensureGetTable(ObjectStore.java:2651)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatistics(ObjectStore.java:6141)
> {noformat}
> HiveMetastore uses start/end functions for starting/ending the scope in 
> MetricsFactory. In some places in HiveMetastore the function names are not 
> matching causing gradual memory leak in metastore with metrics enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-3827) LATERAL VIEW with UNION ALL produces incorrect results

2016-10-19 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-3827:
-
Labels: Correctness CorrectnessBug  (was: CorrectnessBug)

> LATERAL VIEW with UNION ALL produces incorrect results
> --
>
> Key: HIVE-3827
> URL: https://issues.apache.org/jira/browse/HIVE-3827
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0, 1.2.1
> Environment: hive0.9.0 hadoop 0.20.205
>Reporter: cyril liao
>  Labels: Correctness, CorrectnessBug
>
> LATER VIEW lose data working with union all.
> query NO.1:
> SELECT
> 1 as from_pid,
> 1 as to_pid,
> cid as from_path,
> (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path,
> 0 as status
> FROM
> (SELECT union_map(c_map) AS c_map
> FROM
> (SELECT collect_map(id,parent_id)AS c_map
> FROM
> wl_channels
> GROUP BY id,parent_id
> )tmp
> )tmp2
> LATERAL VIEW recursion_concat(c_map) a AS cid, pid
> this query returns about 1 rows ,and their status is 0.
> query NO.2:
> select
> a.from_pid as from_pid,
> a.to_pid as to_pid, 
> a.from_path as from_path,
> a.to_path as to_path,
> a.status as status
> from wl_dc_channels a
> where a.status <> 0
> this query returns about 100 rows ,and their status is 1 or 2.
> query NO.3:
> select
> from_pid,
> to_pid,
> from_path,
> to_path,
> status
> from
> (
> SELECT
> 1 as from_pid,
> 1 as to_pid,
> cid as from_path,
> (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path,
> 0 as status
> FROM
> (SELECT union_map(c_map) AS c_map
> FROM
> (SELECT collect_map(id,parent_id)AS c_map
> FROM
> wl_channels
> GROUP BY id,parent_id
> )tmp
> )tmp2
> LATERAL VIEW recursion_concat(c_map) a AS cid, pid
> union all
> select
> a.from_pid as from_pid,
> a.to_pid as to_pid, 
> a.from_path as from_path,
> a.to_path as to_path,
> a.status as status
> from wl_dc_channels a
> where a.status <> 0
> ) unin_tbl
> this query has the same result as query NO.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12412) Multi insert queries fail to run properly in hive 1.1.x or later.

2016-10-19 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-12412:
--
Labels: Correctness CorrectnessBug  (was: )

> Multi insert queries fail to run properly in hive 1.1.x or later.
> -
>
> Key: HIVE-12412
> URL: https://issues.apache.org/jira/browse/HIVE-12412
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.0
>Reporter: John P. Petrakis
>  Labels: Correctness, CorrectnessBug
>
> We use multi insert queries to take data in one table and manipulate it by 
> inserting it into a results table.  Queries are of this form:
> from (select * from data_table lateral view explode(data_table.f2) f2 as 
> explode_f2) as explode_data_table  
>insert overwrite table results_table partition (q_id='C.P1',rl='1') 
>select 
>array(cast(if(explode_data_table.f1 is null or 
> explode_data_table.f1='', 'UNKNOWN',explode_data_table.f1) as 
> String),cast(explode_f2.s1 as String)) as dimensions, 
>ARRAY(CAST(sum(explode_f2.d1) as Double)) as metrics, 
>null as rownm 
>where (explode_data_table.date_id between 20151016 and 20151016)
>group by 
>if(explode_data_table.f1 is null or explode_data_table.f1='', 
> 'UNKNOWN',explode_data_table.f1),
>explode_f2.s1 
>INSERT OVERWRITE TABLE results_table PARTITION (q_id='C.P2',rl='0') 
>SELECT ARRAY(CAST('Total' as String),CAST('Total' as String)) AS 
> dimensions, 
>ARRAY(CAST(sum(explode_f2.d1) as Double)) AS metrics, 
>null AS rownm 
>WHERE (explode_data_table.date_id BETWEEN 20151016 AND 20151016) 
>INSERT OVERWRITE TABLE results_table PARTITION (q_id='C.P5',rl='0') 
>SELECT 
>ARRAY(CAST('Total' as String)) AS dimensions, 
>ARRAY(CAST(sum(explode_f2.d1) as Double)) AS metrics, 
>null AS rownm 
>WHERE (explode_data_table.date_id BETWEEN 20151016 AND 20151016)
> This query is meant to total a given field of a struct that is potentially a 
> list of structs.  For our test data set, which consists of a single row, the 
> summation yields "Null",  with messages in the hive log of the nature:
> Missing fields! Expected 2 fields but only got 1! Ignoring similar problems.
> or "Extra fields detected..."
> For significantly more data, this query will eventually cause a run time 
> error while processing a column (caused by array index out of bounds 
> exception in one of the lazy binary classes such as LazyBinaryString or 
> LazyBinaryStruct).
> Using the query above from the hive command line, the following data was used:
> (note there are tabs in the data below)
> string oneone:1.0:1.00:10.0,eon:1.0:1.00:100.0
> string twotwo:2.0:2.00:20.0,otw:2.0:2.00:20.0,wott:2.0:2.00:20.0
> string thrthree:3.0:3.00:30.0
> string foufour:4.0:4.00:40.0
> There are two fields, a string, (eg. 'string one') and a list of structs.  
> The following is used to create the table:
> create table if not exists t1 (
>  f1 string, 
>   f2 
> array>
>  )
>   partitioned by (clid string, date_id string) 
>   row format delimited fields 
>  terminated by '09' 
>  collection items terminated by ',' 
>  map keys terminated by ':'
>  lines terminated by '10' 
>  location '/user/hive/warehouse/t1';
> And the following is used to load the data:
> load data local inpath '/path/to/data/file/cplx_test.data2' OVERWRITE  into 
> table t1  partition(client_id='987654321',date_id='20151016');
> The resulting table should yield the following:
> ["string fou","four"] [4.0]   nullC.P11   
> ["string one","eon"]  [1.0]   nullC.P11   
> ["string one","one"]  [1.0]   nullC.P11   
> ["string thr","three"][3.0]   nullC.P11   
> ["string two","otw"]  [2.0]   nullC.P11   
> ["string two","two"]  [2.0]   nullC.P11   
> ["string two","wott"] [2.0]   nullC.P11   
> ["Total","Total"] [15.0]  nullC.P20   
> ["Total"] [15.0]  nullC.P50   
> However what we get is:
> Hive Runtime Error while processing row 
> {"_col2":2.5306499719322744E-258,"_col3":""} (ultimately due to an array 
> index out of bounds exception)
> If we reduce the above data to a SINGLE row, the we don't get an exception 
> but the total fields come out as NULL.
> The ONLY way this query would work is 
> 1) if I added a group by (date_id) or even group by ('') as the last line in 
> the query... or removed 

[jira] [Updated] (HIVE-3827) LATERAL VIEW with UNION ALL produces incorrect results

2016-10-18 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-3827:
-
Affects Version/s: 1.2.1

> LATERAL VIEW with UNION ALL produces incorrect results
> --
>
> Key: HIVE-3827
> URL: https://issues.apache.org/jira/browse/HIVE-3827
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0, 1.2.1
> Environment: hive0.9.0 hadoop 0.20.205
>Reporter: cyril liao
>
> LATER VIEW lose data working with union all.
> query NO.1:
> SELECT
> 1 as from_pid,
> 1 as to_pid,
> cid as from_path,
> (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path,
> 0 as status
> FROM
> (SELECT union_map(c_map) AS c_map
> FROM
> (SELECT collect_map(id,parent_id)AS c_map
> FROM
> wl_channels
> GROUP BY id,parent_id
> )tmp
> )tmp2
> LATERAL VIEW recursion_concat(c_map) a AS cid, pid
> this query returns about 1 rows ,and their status is 0.
> query NO.2:
> select
> a.from_pid as from_pid,
> a.to_pid as to_pid, 
> a.from_path as from_path,
> a.to_path as to_path,
> a.status as status
> from wl_dc_channels a
> where a.status <> 0
> this query returns about 100 rows ,and their status is 1 or 2.
> query NO.3:
> select
> from_pid,
> to_pid,
> from_path,
> to_path,
> status
> from
> (
> SELECT
> 1 as from_pid,
> 1 as to_pid,
> cid as from_path,
> (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path,
> 0 as status
> FROM
> (SELECT union_map(c_map) AS c_map
> FROM
> (SELECT collect_map(id,parent_id)AS c_map
> FROM
> wl_channels
> GROUP BY id,parent_id
> )tmp
> )tmp2
> LATERAL VIEW recursion_concat(c_map) a AS cid, pid
> union all
> select
> a.from_pid as from_pid,
> a.to_pid as to_pid, 
> a.from_path as from_path,
> a.to_path as to_path,
> a.status as status
> from wl_dc_channels a
> where a.status <> 0
> ) unin_tbl
> this query has the same result as query NO.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3827) LATERAL VIEW with UNION ALL produces incorrect results

2016-10-18 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-3827:
-
Summary: LATERAL VIEW with UNION ALL produces incorrect results  (was: 
LATERAL VIEW doesn't work with union all statement)

> LATERAL VIEW with UNION ALL produces incorrect results
> --
>
> Key: HIVE-3827
> URL: https://issues.apache.org/jira/browse/HIVE-3827
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0
> Environment: hive0.9.0 hadoop 0.20.205
>Reporter: cyril liao
>
> LATER VIEW lose data working with union all.
> query NO.1:
> SELECT
> 1 as from_pid,
> 1 as to_pid,
> cid as from_path,
> (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path,
> 0 as status
> FROM
> (SELECT union_map(c_map) AS c_map
> FROM
> (SELECT collect_map(id,parent_id)AS c_map
> FROM
> wl_channels
> GROUP BY id,parent_id
> )tmp
> )tmp2
> LATERAL VIEW recursion_concat(c_map) a AS cid, pid
> this query returns about 1 rows ,and their status is 0.
> query NO.2:
> select
> a.from_pid as from_pid,
> a.to_pid as to_pid, 
> a.from_path as from_path,
> a.to_path as to_path,
> a.status as status
> from wl_dc_channels a
> where a.status <> 0
> this query returns about 100 rows ,and their status is 1 or 2.
> query NO.3:
> select
> from_pid,
> to_pid,
> from_path,
> to_path,
> status
> from
> (
> SELECT
> 1 as from_pid,
> 1 as to_pid,
> cid as from_path,
> (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path,
> 0 as status
> FROM
> (SELECT union_map(c_map) AS c_map
> FROM
> (SELECT collect_map(id,parent_id)AS c_map
> FROM
> wl_channels
> GROUP BY id,parent_id
> )tmp
> )tmp2
> LATERAL VIEW recursion_concat(c_map) a AS cid, pid
> union all
> select
> a.from_pid as from_pid,
> a.to_pid as to_pid, 
> a.from_path as from_path,
> a.to_path as to_path,
> a.status as status
> from wl_dc_channels a
> where a.status <> 0
> ) unin_tbl
> this query has the same result as query NO.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13046) DependencyResolver should not lowercase the dependency URI's authority

2016-10-17 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-13046:
--
Fix Version/s: 2.2.0

> DependencyResolver should not lowercase the dependency URI's authority
> --
>
> Key: HIVE-13046
> URL: https://issues.apache.org/jira/browse/HIVE-13046
> Project: Hive
>  Issue Type: Bug
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Fix For: 2.2.0
>
> Attachments: HIVE-13046.1.patch, HIVE-13046.2.patch
>
>
> When using {{ADD JAR ivy://...}} to add a jar version {{1.2.3-SNAPSHOT}}, 
> Hive will lowercase it to {{1.2.3-snapshot}} due to:
> {code:title=DependencyResolver.java#84}
> String[] authorityTokens = authority.toLowerCase().split(":");
> {code}
> We should not {{.lowerCase()}}.
> RB: https://reviews.apache.org/r/43513



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13046) DependencyResolver should not lowercase the dependency URI's authority

2016-10-17 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-13046:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master.

> DependencyResolver should not lowercase the dependency URI's authority
> --
>
> Key: HIVE-13046
> URL: https://issues.apache.org/jira/browse/HIVE-13046
> Project: Hive
>  Issue Type: Bug
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Fix For: 2.2.0
>
> Attachments: HIVE-13046.1.patch, HIVE-13046.2.patch
>
>
> When using {{ADD JAR ivy://...}} to add a jar version {{1.2.3-SNAPSHOT}}, 
> Hive will lowercase it to {{1.2.3-snapshot}} due to:
> {code:title=DependencyResolver.java#84}
> String[] authorityTokens = authority.toLowerCase().split(":");
> {code}
> We should not {{.lowerCase()}}.
> RB: https://reviews.apache.org/r/43513



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13046) DependencyResolver should not lowercase the dependency URI's authority

2016-10-11 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566862#comment-15566862
 ] 

Carl Steinbach commented on HIVE-13046:
---

+1. Will commit.

> DependencyResolver should not lowercase the dependency URI's authority
> --
>
> Key: HIVE-13046
> URL: https://issues.apache.org/jira/browse/HIVE-13046
> Project: Hive
>  Issue Type: Bug
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-13046.1.patch
>
>
> When using {{ADD JAR ivy://...}} to add a jar version {{1.2.3-SNAPSHOT}}, 
> Hive will lowercase it to {{1.2.3-snapshot}} due to:
> {code:title=DependencyResolver.java#84}
> String[] authorityTokens = authority.toLowerCase().split(":");
> {code}
> We should not {{.lowerCase()}}.
> RB: https://reviews.apache.org/r/43513



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-09-07 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-14159:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master:

{noformat}
% g log -1 --stat
commit 6e76ee3aef2210b2a1efa20d92ac997800cfcb75
Author: Carl Steinbach 
Date:   Wed Sep 7 11:28:35 2016 -0700

HIVE-14159 : sorting of tuple array using multiple field[s] (Simanchal Das 
via Carl Steinbach)

 itests/src/test/resources/testconfiguration.properties |   
1 +
 itests/src/test/resources/testconfiguration.properties.orig|   
8 +-
 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java   |   
1 +
 .../org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java  | 
202 ++
 .../apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java  | 
228 
 ql/src/test/queries/clientnegative/udf_sort_array_by_wrong1.q  |   
2 +
 ql/src/test/queries/clientnegative/udf_sort_array_by_wrong2.q  |   
2 +
 ql/src/test/queries/clientnegative/udf_sort_array_by_wrong3.q  |  
16 ++
 ql/src/test/queries/clientpositive/udf_sort_array_by.q | 
136 
 ql/src/test/results/beelinepositive/show_functions.q.out   |   
1 +
 ql/src/test/results/clientnegative/udf_sort_array_by_wrong1.q.out  |   
1 +
 ql/src/test/results/clientnegative/udf_sort_array_by_wrong2.q.out  |   
1 +
 ql/src/test/results/clientnegative/udf_sort_array_by_wrong3.q.out  |  
37 
 ql/src/test/results/clientpositive/show_functions.q.out|   
1 +
 ql/src/test/results/clientpositive/udf_sort_array_by.q.out | 
401 +++
 15 files changed, 1036 insertions(+), 2 deletions(-)
{noformat}

> sorting of tuple array using multiple field[s]
> --
>
> Key: HIVE-14159
> URL: https://issues.apache.org/jira/browse/HIVE-14159
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>  Labels: patch
> Fix For: 2.2.0
>
> Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch, 
> HIVE-14159.3.patch, HIVE-14159.4.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee 
> objects.
> {noformat}
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc by ASC or DESC order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one 
> or more fields in ASC or DESC order provided by user ,default is ascending 
> order .
> {noformat}
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
>   output: 
> 

[jira] [Commented] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-08-30 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449646#comment-15449646
 ] 

Carl Steinbach commented on HIVE-14159:
---

+1

[~simanchal], can you please attach a fresh copy of the patch to trigger 
another test run? Thanks.

> sorting of tuple array using multiple field[s]
> --
>
> Key: HIVE-14159
> URL: https://issues.apache.org/jira/browse/HIVE-14159
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>  Labels: patch
> Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch, 
> HIVE-14159.3.patch, HIVE-14159.4.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee 
> objects.
> {noformat}
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc by ASC or DESC order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one 
> or more fields in ASC or DESC order provided by user ,default is ascending 
> order .
> {noformat}
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-07-07 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365699#comment-15365699
 ] 

Carl Steinbach commented on HIVE-14159:
---

Hi [~simanchal], I left some comments on RB. Also, it looks like there is a 
test failure in show_functions. Please take a look.

> sorting of tuple array using multiple field[s]
> --
>
> Key: HIVE-14159
> URL: https://issues.apache.org/jira/browse/HIVE-14159
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>  Labels: patch
> Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee 
> objects.
> {noformat}
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc.
> Proposal:
> I have developed a udf 'sort_array_field' which will sort a tuple array by 
> one or more fields in naural order.
> {noformat}
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age);
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-07-07 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-14159:
--
Status: Open  (was: Patch Available)

> sorting of tuple array using multiple field[s]
> --
>
> Key: HIVE-14159
> URL: https://issues.apache.org/jira/browse/HIVE-14159
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>  Labels: patch
> Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee 
> objects.
> {noformat}
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc.
> Proposal:
> I have developed a udf 'sort_array_field' which will sort a tuple array by 
> one or more fields in naural order.
> {noformat}
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age);
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11402) HS2 - disallow parallel query execution within a single Session

2016-06-28 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352488#comment-15352488
 ] 

Carl Steinbach commented on HIVE-11402:
---

HiveSessionImpl is starting to look a lot like SessionState. I suppose that was 
inevitable :(

> HS2 - disallow parallel query execution within a single Session
> ---
>
> Key: HIVE-11402
> URL: https://issues.apache.org/jira/browse/HIVE-11402
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11402.01.patch, HIVE-11402.patch
>
>
> HiveServer2 currently allows concurrent queries to be run in a single 
> session. However, every HS2 session has  an associated SessionState object, 
> and the use of SessionState in many places assumes that only one thread is 
> using it, ie it is not thread safe.
> There are many places where SesssionState thread safety needs to be 
> addressed, and until then we should serialize all query execution for a 
> single HS2 session. -This problem can become more visible with HIVE-4239 now 
> allowing parallel query compilation.-
> Note that running queries in parallel for single session is not 
> straightforward  with jdbc, you need to spawn another thread as the 
> Statement.execute calls are blocking. I believe ODBC has non blocking query 
> execution API, and Hue is another well known application that shares sessions 
> for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13363) Add hive.metastore.token.signature property to HiveConf

2016-05-11 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-13363:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.1.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Anthony!

> Add hive.metastore.token.signature property to HiveConf
> ---
>
> Key: HIVE-13363
> URL: https://issues.apache.org/jira/browse/HIVE-13363
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Fix For: 2.1.0
>
> Attachments: HIVE-13363.1.patch, HIVE-13363.2.patch
>
>
> I noticed that the {{hive.metastore.token.signature}} property is not defined 
> in HiveConf.java, but hardcoded everywhere it's used in the Hive codebase.
> [HIVE-2963] fixes this but was never committed due to being resolved as a 
> duplicate ticket.
> We should add {{hive.metastore.token.signature}} to HiveConf.java to 
> centralize its definition and make the property more discoverable (it's 
> useful to set it when talking to multiple metastores).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13363) Add hive.metastore.token.signature property to HiveConf

2016-05-04 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271349#comment-15271349
 ] 

Carl Steinbach commented on HIVE-13363:
---

+1

> Add hive.metastore.token.signature property to HiveConf
> ---
>
> Key: HIVE-13363
> URL: https://issues.apache.org/jira/browse/HIVE-13363
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-13363.1.patch, HIVE-13363.2.patch
>
>
> I noticed that the {{hive.metastore.token.signature}} property is not defined 
> in HiveConf.java, but hardcoded everywhere it's used in the Hive codebase.
> [HIVE-2963] fixes this but was never committed due to being resolved as a 
> duplicate ticket.
> We should add {{hive.metastore.token.signature}} to HiveConf.java to 
> centralize its definition and make the property more discoverable (it's 
> useful to set it when talking to multiple metastores).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13415) Decouple Sessions from thrift binary transport

2016-04-05 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227123#comment-15227123
 ] 

Carl Steinbach commented on HIVE-13415:
---

Making this behavior configurable on a per-session basis sounds like the right 
approach to me.

[~prongs], thanks for working on this. I really appreciate it.

> Decouple Sessions from thrift binary transport
> --
>
> Key: HIVE-13415
> URL: https://issues.apache.org/jira/browse/HIVE-13415
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-13415.01.patch
>
>
> Current behaviour is:
> * Open a thrift binary transport
> * create a session
> * close the transport
> Then the session gets closed. Consequently, all the operations running in the 
> session also get killed.
> Whereas, if you open an HTTP transport, and close, the enclosing sessions are 
> not closed. 
> This seems like a bad design, having transport and sessions tightly coupled. 
> I'd like to fix this. 
> The issue that introduced it is 
> [HIVE-9601|https://github.com/apache/hive/commit/48bea00c48853459af64b4ca9bfdc3e821c4ed82]
>  Relevant discussions at 
> [here|https://issues.apache.org/jira/browse/HIVE-11485?focusedCommentId=15223546=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15223546],
>  
> [here|https://issues.apache.org/jira/browse/HIVE-11485?focusedCommentId=15223827=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15223827]
>  and mentioned links on those comments. 
> Another thing that seems like a slightly bad design is this line of code in 
> ThriftBinaryCLIService:
> {noformat}
> server.setServerEventHandler(serverEventHandler);
> {noformat}
> Whereas serverEventHandler is defined by the base class, with no users except 
> one sub-class(ThriftBinaryCLIService), violating the separation of concerns. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12277) Hive macro results on macro_duplicate.q different after adding ORDER BY

2016-04-04 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-12277:
--
Labels: CorrectnessBug  (was: )

> Hive macro results on macro_duplicate.q different after adding ORDER BY
> ---
>
> Key: HIVE-12277
> URL: https://issues.apache.org/jira/browse/HIVE-12277
> Project: Hive
>  Issue Type: Bug
>  Components: Macros
>Affects Versions: 1.2.0
>Reporter: Jason Dere
>Assignee: Pengcheng Xiong
>  Labels: CorrectnessBug
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12277.01.patch
>
>
> Added an order-by to the query in macro_duplicate.q:
> {noformat}
> -select math_square(a), math_square(b),factorial(a), factorial(b), 
> math_add(a), math_add(b),int(c) from macro_testing;
> \ No newline at end of file
> +select math_square(a), math_square(b),factorial(a), factorial(b), 
> math_add(a), math_add(b),int(c) from macro_testing order by int(c);
> {noformat}
> And the results from math_add() changed unexpectedly:
> {noformat}
> -1  4   1   2   2   4   3
> -16 25  24  120 8   10  6
> +1  4   1   2   1   4   3
> +16 25  24  120 16  25  6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7189) Hive does not store column names in ORC

2016-04-03 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-7189:
-
Component/s: ORC

> Hive does not store column names in ORC
> ---
>
> Key: HIVE-7189
> URL: https://issues.apache.org/jira/browse/HIVE-7189
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, ORC
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Chris Drome
>
> We uncovered the following discrepancy between writing ORC files through Pig 
> and Hive:
> ORCFile header contains the name of the columns. Storing through Pig 
> (ORCStorage or HCatStorer), the column names are stored fine. But when stored 
> through hive they are stored as _col0, _col1,,_col99 and hive uses the 
> partition schema to map the column names. Reading the same file through Pig 
> then has problems as user will have to manually map columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13193) Enable the complication in parallel in single session

2016-04-03 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223551#comment-15223551
 ] 

Carl Steinbach commented on HIVE-13193:
---

Changing the component from "HiveServer2" to "Query Processor" since 
SessionState is not part of HS2.

[~aihuaxu], there have been some extensive discussions about the problems with 
SessionState in other JIRAs and on the HiveDev mailing list. Someone also gave 
a presentation about it at the last HiveContrib meetup. I think there is 
general consensus that many of these problems stem from a bad mixture of thread 
local and singleton design patterns in the SessionState class. Transitioning to 
a simpler model where a per-Session object is explicitly passed into the 
compiler and execution code paths would probably solve many of these issues. 
This will require extensive changes to existing code, but don't let this 
dissuade you -- our current problems with SessionState are largely the result 
of many years of quick-fix solutions layered one atop another.

> Enable the complication in parallel in single session
> -
>
> Key: HIVE-13193
> URL: https://issues.apache.org/jira/browse/HIVE-13193
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> Follow up on HIVE-4239. Investigate the needed change to support parallel 
> complication in the same session. 
> Some operation related stuff should be in OperationState rather than in 
> SessionState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13193) Enable the complication in parallel in single session

2016-04-03 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-13193:
--
Component/s: (was: HiveServer2)
 Query Processor

> Enable the complication in parallel in single session
> -
>
> Key: HIVE-13193
> URL: https://issues.apache.org/jira/browse/HIVE-13193
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> Follow up on HIVE-4239. Investigate the needed change to support parallel 
> complication in the same session. 
> Some operation related stuff should be in OperationState rather than in 
> SessionState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12079) Add units tests for HiveServer2 LDAP filters added in HIVE-7193

2016-04-03 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-12079:
--
Summary: Add units tests for HiveServer2 LDAP filters added in HIVE-7193  
(was: Add units tests for LDAP filters in HIVE-7193)

> Add units tests for HiveServer2 LDAP filters added in HIVE-7193
> ---
>
> Key: HIVE-12079
> URL: https://issues.apache.org/jira/browse/HIVE-12079
> Project: Hive
>  Issue Type: Test
>  Components: HiveServer2
>Affects Versions: 1.1.1
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>
> HIVE-11866 adds a test framework that uses an in-memory ldap server for unit 
> tests. Need to add unit tests for user and group filtering feature added in 
> HIVE-7193.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11485) Session close should not close async SQL operations

2016-04-03 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-11485.
---
Resolution: Won't Fix

The premise of this ticket is fundamentally flawed. Resolving as wontfix.

> Session close should not close async SQL operations
> ---
>
> Key: HIVE-11485
> URL: https://issues.apache.org/jira/browse/HIVE-11485
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Amareshwari Sriramadasu
>Assignee: Deepak Barr
> Attachments: HIVE-11485.master.patch
>
>
> Right now, session close on HiveServer closes all operations. But, queries 
> running are actually available across sessions and they are not tied to a 
> session (expect the launch - which requires configuration and resources). And 
> it allows getting the status of the query across sessions.
> But session close of the session ( on which operation is launched) closes all 
> the operations as well. 
> So, we should avoid closing all operations upon closing a session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11485) Session close should not close async SQL operations

2016-04-03 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223546#comment-15223546
 ] 

Carl Steinbach commented on HIVE-11485:
---

bq. Sure. whats the story from Hive if the client crashes ?

One of the original design goals of HiveServer2 was to provide *logical* 
sessions that are completely independent of physical details, including 
connections and server-side threads (both of which were problems with 
HiveServer1). In the future this logical decoupling will hopefully make it 
easier to support features like transparent session migration from one HS2 
instance to another.

bq. After HIVE-9601, we are seeing the issue is prominent.

HIVE-9601 created a binding between logical sessions and the underlying 
physical connection. This violates the design principal I described above and 
should either be reverted or made optional.

bq. But, queries running are actually available across sessions and they are 
not tied to a session (expect the launch - which requires configuration and 
resources). And it allows getting the status of the query across sessions.

No, this is not true. Every Operation is explicitly tied to exactly one logical 
session. Please review the Thrift CLIService IDL file for more details.

bq. But session close of the session ( on which operation is launched) closes 
all the operations as well.

This is a fundamental principal of the HS2 design. Changing this will screw up 
a lot of things. Please don't do this.

> Session close should not close async SQL operations
> ---
>
> Key: HIVE-11485
> URL: https://issues.apache.org/jira/browse/HIVE-11485
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Amareshwari Sriramadasu
>Assignee: Deepak Barr
> Attachments: HIVE-11485.master.patch
>
>
> Right now, session close on HiveServer closes all operations. But, queries 
> running are actually available across sessions and they are not tied to a 
> session (expect the launch - which requires configuration and resources). And 
> it allows getting the status of the query across sessions.
> But session close of the session ( on which operation is launched) closes all 
> the operations as well. 
> So, we should avoid closing all operations upon closing a session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-1718) Implement SerDe for processing fixed length data

2016-03-28 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1718:
-
Assignee: (was: Shreepadma Venugopalan)

> Implement SerDe for processing fixed length data
> 
>
> Key: HIVE-1718
> URL: https://issues.apache.org/jira/browse/HIVE-1718
> Project: Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Reporter: Carl Steinbach
>
> Fixed length fields are pretty common in legacy data formats. While it is 
> already
> possible to process these files using the RegexSerDe, they could be more 
> efficiently
> handled using a SerDe that is specifically crafted for reading/writing fixed 
> length
> fields. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-13256) LLAP: RowGroup counter is wrong

2016-03-27 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-13256.
---
Resolution: Duplicate

> LLAP: RowGroup counter is wrong
> ---
>
> Key: HIVE-13256
> URL: https://issues.apache.org/jira/browse/HIVE-13256
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Log line from LlapIOCounter
> {code}
> ROWS_EMITTED=23528469, SELECTED_ROWGROUPS=87
> {code}
> If rowgroups contain 10K rows by default then expected count is 235 for the 
> above case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13115) MetaStore Direct SQL getPartitions call fail when the columns schemas for a partition are null

2016-03-27 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-13115:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.0.1
   2.1.0
   1.2.2
   1.3.0
   Status: Resolved  (was: Patch Available)

Committed to following branches:

{noformat}
510ef50 (refs/remotes/origin/branch-1.2) HIVE-13115: MetaStore Direct SQL 
getPartitions call fail when the columns schemas for a partition are null 
(Ratandeep Ratti reviewed by Carl Steinbach)
3a39aba (refs/remotes/origin/branch-1) HIVE-13115: MetaStore Direct SQL 
getPartitions call fail when the columns schemas for a partition are null 
(Ratandeep Ratti reviewed by Carl Steinbach)
95c2b6b (refs/remotes/origin/branch-2.0) HIVE-13115: MetaStore Direct SQL 
getPartitions call fail when the columns schemas for a partition are null 
(Ratandeep Ratti reviewed by Carl Steinbach)
69cfd35 (HEAD -> refs/heads/master, refs/remotes/origin/master, 
refs/remotes/origin/HEAD) HIVE-13115: MetaStore Direct SQL getPartitions call 
fail when the columns schemas for a partition are null (Ratandeep Ratti 
reviewed by Carl Steinbach)
{noformat}


> MetaStore Direct SQL getPartitions call fail when the columns schemas for a 
> partition are null
> --
>
> Key: HIVE-13115
> URL: https://issues.apache.org/jira/browse/HIVE-13115
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>  Labels: DirectSql, MetaStore, ORM
> Fix For: 1.3.0, 1.2.2, 2.1.0, 2.0.1
>
> Attachments: HIVE-13115.patch, HIVE-13115.reproduce.issue.patch
>
>
> We are seeing the following exception in our MetaStore logs
> {noformat}
> 2016-02-11 00:00:19,002 DEBUG metastore.MetaStoreDirectSql 
> (MetaStoreDirectSql.java:timingTrace(602)) - Direct SQL query in 5.842372ms + 
> 1.066728ms, the query is [select "PARTITIONS"."PART_ID" from "PARTITIONS"  
> inner join "TBLS" on "PART
> ITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ?   inner join 
> "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"  and "DBS"."NAME" = ?  order by 
> "PART_NAME" asc]
> 2016-02-11 00:00:19,021 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(2243)) - Direct SQL failed, falling 
> back to ORM
> MetaException(message:Unexpected null for one of the IDs, SD 6437, column 
> null, serde 6437 for a non- view)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:360)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitions(MetaStoreDirectSql.java:224)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1563)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1559)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2208)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1570)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1553)
> at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
> at com.sun.proxy.$Proxy5.getPartitions(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:2526)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:8747)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:8731)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1591)
> at 
> 

[jira] [Commented] (HIVE-13330) ORC vectorized string dictionary reader does not differentiate null vs empty string dictionary

2016-03-25 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212643#comment-15212643
 ] 

Carl Steinbach commented on HIVE-13330:
---

Please change the name of the test from "vector_string_reader_empty_dict.q" to 
"orc_string_reader_empty_dict.q"

> ORC vectorized string dictionary reader does not differentiate null vs empty 
> string dictionary
> --
>
> Key: HIVE-13330
> URL: https://issues.apache.org/jira/browse/HIVE-13330
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
>  Labels: CorrectnessBug
> Attachments: HIVE-13330.1.patch, HIVE-13330.2.patch
>
>
> Vectorized string dictionary reader cannot differentiate between the case 
> where all dictionary entries are null vs single entry with empty string. This 
> causes wrong results when reading data out of such files. 
> {code:title=Vectorization On}
> SET hive.vectorized.execution.enabled=true;
> SET hive.fetch.task.conversion=none;
> select vcol from testnullorc3 limit 1;
> OK
> NULL
> {code}
> {code:title=Vectorization Off}
> SET hive.vectorized.execution.enabled=false;
> SET hive.fetch.task.conversion=none;
> select vcol from testnullorc3 limit 1;
> OK
> {code}
> The input table testnullorc3 contains a varchar column vcol with few empty 
> strings and few nulls. For this table, non vectorized reader returns empty as 
> first row but vectorized reader returns NULL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13330) ORC vectorized string dictionary reader does not differentiate null vs empty string dictionary

2016-03-25 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-13330:
--
Labels: CorrectnessBug  (was: Correctness)

> ORC vectorized string dictionary reader does not differentiate null vs empty 
> string dictionary
> --
>
> Key: HIVE-13330
> URL: https://issues.apache.org/jira/browse/HIVE-13330
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
>  Labels: CorrectnessBug
> Attachments: HIVE-13330.1.patch, HIVE-13330.2.patch
>
>
> Vectorized string dictionary reader cannot differentiate between the case 
> where all dictionary entries are null vs single entry with empty string. This 
> causes wrong results when reading data out of such files. 
> {code:title=Vectorization On}
> SET hive.vectorized.execution.enabled=true;
> SET hive.fetch.task.conversion=none;
> select vcol from testnullorc3 limit 1;
> OK
> NULL
> {code}
> {code:title=Vectorization Off}
> SET hive.vectorized.execution.enabled=false;
> SET hive.fetch.task.conversion=none;
> select vcol from testnullorc3 limit 1;
> OK
> {code}
> The input table testnullorc3 contains a varchar column vcol with few empty 
> strings and few nulls. For this table, non vectorized reader returns empty as 
> first row but vectorized reader returns NULL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13330) ORC vectorized string dictionary reader does not differentiate null vs empty string dictionary

2016-03-25 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-13330:
--
Labels: Correctness  (was: )

> ORC vectorized string dictionary reader does not differentiate null vs empty 
> string dictionary
> --
>
> Key: HIVE-13330
> URL: https://issues.apache.org/jira/browse/HIVE-13330
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
>  Labels: Correctness
> Attachments: HIVE-13330.1.patch, HIVE-13330.2.patch
>
>
> Vectorized string dictionary reader cannot differentiate between the case 
> where all dictionary entries are null vs single entry with empty string. This 
> causes wrong results when reading data out of such files. 
> {code:title=Vectorization On}
> SET hive.vectorized.execution.enabled=true;
> SET hive.fetch.task.conversion=none;
> select vcol from testnullorc3 limit 1;
> OK
> NULL
> {code}
> {code:title=Vectorization Off}
> SET hive.vectorized.execution.enabled=false;
> SET hive.fetch.task.conversion=none;
> select vcol from testnullorc3 limit 1;
> OK
> {code}
> The input table testnullorc3 contains a varchar column vcol with few empty 
> strings and few nulls. For this table, non vectorized reader returns empty as 
> first row but vectorized reader returns NULL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13115) MetaStore Direct SQL getPartitions call fail when the columns schemas for a partition are null

2016-03-22 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207080#comment-15207080
 ] 

Carl Steinbach commented on HIVE-13115:
---

+1

> MetaStore Direct SQL getPartitions call fail when the columns schemas for a 
> partition are null
> --
>
> Key: HIVE-13115
> URL: https://issues.apache.org/jira/browse/HIVE-13115
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>  Labels: DirectSql, MetaStore, ORM
> Attachments: HIVE-13115.patch, HIVE-13115.reproduce.issue.patch
>
>
> We are seeing the following exception in our MetaStore logs
> {noformat}
> 2016-02-11 00:00:19,002 DEBUG metastore.MetaStoreDirectSql 
> (MetaStoreDirectSql.java:timingTrace(602)) - Direct SQL query in 5.842372ms + 
> 1.066728ms, the query is [select "PARTITIONS"."PART_ID" from "PARTITIONS"  
> inner join "TBLS" on "PART
> ITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ?   inner join 
> "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"  and "DBS"."NAME" = ?  order by 
> "PART_NAME" asc]
> 2016-02-11 00:00:19,021 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(2243)) - Direct SQL failed, falling 
> back to ORM
> MetaException(message:Unexpected null for one of the IDs, SD 6437, column 
> null, serde 6437 for a non- view)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:360)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitions(MetaStoreDirectSql.java:224)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1563)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1559)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2208)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1570)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1553)
> at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
> at com.sun.proxy.$Proxy5.getPartitions(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:2526)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:8747)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:8731)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1591)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This direct SQL call fails for every {{getPartitions}} call and then falls 
> back to ORM.
> The query which fails is
> {code}
> select 
>   PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID,
>   SERDES.SERDE_ID, PARTITIONS.CREATE_TIME,
>   PARTITIONS.LAST_ACCESS_TIME, SDS.INPUT_FORMAT, SDS.IS_COMPRESSED,
>   SDS.IS_STOREDASSUBDIRECTORIES, SDS.LOCATION, SDS.NUM_BUCKETS,
>   SDS.OUTPUT_FORMAT, SERDES.NAME, SERDES.SLIB 
> from PARTITIONS
>   left outer join SDS on PARTITIONS.SD_ID = SDS.SD_ID 
>   left outer join SERDES on SDS.SERDE_ID = SERDES.SERDE_ID 
>   where PART_ID in (  ?  ) order by PART_NAME asc;
> {code}
> By looking at the source 

[jira] [Commented] (HIVE-13115) MetaStore Direct SQL getPartitions call fail when the columns schemas for a partition are null

2016-03-22 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205926#comment-15205926
 ] 

Carl Steinbach commented on HIVE-13115:
---

Hi [~rdsr], can you please post the patch on RB? Thanks.

> MetaStore Direct SQL getPartitions call fail when the columns schemas for a 
> partition are null
> --
>
> Key: HIVE-13115
> URL: https://issues.apache.org/jira/browse/HIVE-13115
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>  Labels: DirectSql, MetaStore, ORM
> Attachments: HIVE-13115.patch, HIVE-13115.reproduce.issue.patch
>
>
> We are seeing the following exception in our MetaStore logs
> {noformat}
> 2016-02-11 00:00:19,002 DEBUG metastore.MetaStoreDirectSql 
> (MetaStoreDirectSql.java:timingTrace(602)) - Direct SQL query in 5.842372ms + 
> 1.066728ms, the query is [select "PARTITIONS"."PART_ID" from "PARTITIONS"  
> inner join "TBLS" on "PART
> ITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ?   inner join 
> "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"  and "DBS"."NAME" = ?  order by 
> "PART_NAME" asc]
> 2016-02-11 00:00:19,021 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(2243)) - Direct SQL failed, falling 
> back to ORM
> MetaException(message:Unexpected null for one of the IDs, SD 6437, column 
> null, serde 6437 for a non- view)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:360)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitions(MetaStoreDirectSql.java:224)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1563)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1559)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2208)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1570)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1553)
> at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
> at com.sun.proxy.$Proxy5.getPartitions(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:2526)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:8747)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:8731)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1591)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This direct SQL call fails for every {{getPartitions}} call and then falls 
> back to ORM.
> The query which fails is
> {code}
> select 
>   PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID,
>   SERDES.SERDE_ID, PARTITIONS.CREATE_TIME,
>   PARTITIONS.LAST_ACCESS_TIME, SDS.INPUT_FORMAT, SDS.IS_COMPRESSED,
>   SDS.IS_STOREDASSUBDIRECTORIES, SDS.LOCATION, SDS.NUM_BUCKETS,
>   SDS.OUTPUT_FORMAT, SERDES.NAME, SERDES.SLIB 
> from PARTITIONS
>   left outer join SDS on PARTITIONS.SD_ID = SDS.SD_ID 
>   left outer join SERDES on SDS.SERDE_ID = SERDES.SERDE_ID 
>   where PART_ID in (  ?  ) order by 

[jira] [Updated] (HIVE-4570) More information to user on GetOperationStatus in Hive Server2 when query is still executing

2016-03-18 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-4570:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> More information to user on GetOperationStatus in Hive Server2 when query is 
> still executing
> 
>
> Key: HIVE-4570
> URL: https://issues.apache.org/jira/browse/HIVE-4570
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Amareshwari Sriramadasu
>Assignee: Rajat Khandelwal
> Fix For: 2.1.0
>
> Attachments: HIVE-4570.01.patch, HIVE-4570.01.patch, 
> HIVE-4570.02.patch, HIVE-4570.03.patch, HIVE-4570.03.patch, 
> HIVE-4570.04.patch, HIVE-4570.04.patch, HIVE-4570.06.patch, HIVE-4570.07.patch
>
>
> Currently in Hive Server2, when the query is still executing only the status 
> is set as STILL_EXECUTING. 
> This issue is to give more information to the user such as progress and 
> running job handles, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10187) Avro backed tables don't handle cyclical or recursive records

2016-02-12 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-10187:
--
Fix Version/s: 2.1.0

> Avro backed tables don't handle cyclical or recursive records
> -
>
> Key: HIVE-10187
> URL: https://issues.apache.org/jira/browse/HIVE-10187
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.0
>Reporter: Mark Wagner
>Assignee: Mark Wagner
> Fix For: 2.1.0
>
> Attachments: HIVE-10187.1.patch, HIVE-10187.2.patch, 
> HIVE-10187.3.patch, HIVE-10187.4.patch, HIVE-10187.5.patch, 
> HIVE-10187.demo.patch
>
>
> [HIVE-7653] changed the Avro SerDe to make it generate TypeInfos even for 
> recursive/cyclical schemas. However, any attempt to serialize data which 
> exploits that ability results in silently dropped fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10187) Avro backed tables don't handle cyclical or recursive records

2016-02-09 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140210#comment-15140210
 ] 

Carl Steinbach commented on HIVE-10187:
---

+1. Will commit if the test results come back clean.

> Avro backed tables don't handle cyclical or recursive records
> -
>
> Key: HIVE-10187
> URL: https://issues.apache.org/jira/browse/HIVE-10187
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.0
>Reporter: Mark Wagner
>Assignee: Mark Wagner
> Attachments: HIVE-10187.1.patch, HIVE-10187.2.patch, 
> HIVE-10187.3.patch, HIVE-10187.4.patch, HIVE-10187.5.patch, 
> HIVE-10187.demo.patch
>
>
> [HIVE-7653] changed the Avro SerDe to make it generate TypeInfos even for 
> recursive/cyclical schemas. However, any attempt to serialize data which 
> exploits that ability results in silently dropped fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4570) More information to user on GetOperationStatus in Hive Server2 when query is still executing

2016-01-25 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114893#comment-15114893
 ] 

Carl Steinbach commented on HIVE-4570:
--

Please post a review request on RB for this change. Thanks.

> More information to user on GetOperationStatus in Hive Server2 when query is 
> still executing
> 
>
> Key: HIVE-4570
> URL: https://issues.apache.org/jira/browse/HIVE-4570
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.11.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Akshay Goyal
> Attachments: HIVE-4570.01.patch, HIVE-4570.02.patch
>
>
> Currently in Hive Server2, when the query is still executing only the status 
> is set as STILL_EXECUTING. 
> This issue is to give more information to the user such as progress and 
> running job handles, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4570) More information to user on GetOperationStatus in Hive Server2 when query is still executing

2016-01-25 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-4570:
-
Affects Version/s: (was: 0.11.0)

> More information to user on GetOperationStatus in Hive Server2 when query is 
> still executing
> 
>
> Key: HIVE-4570
> URL: https://issues.apache.org/jira/browse/HIVE-4570
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Amareshwari Sriramadasu
>Assignee: Akshay Goyal
> Attachments: HIVE-4570.01.patch, HIVE-4570.02.patch
>
>
> Currently in Hive Server2, when the query is still executing only the status 
> is set as STILL_EXECUTING. 
> This issue is to give more information to the user such as progress and 
> running job handles, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-1073) CREATE VIEW followup: track view dependency information in metastore

2015-11-24 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025987#comment-15025987
 ] 

Carl Steinbach commented on HIVE-1073:
--

Hi [~freepeter], thanks for writing up these notes!

bq. To track the view dependency, I will add a new class MTableDependency (name 
TBD) which contains srcTbl and dstTbl.

Since only views can have dependencies on other tables/views it probably makes 
sense to change the name to MViewDependency, and replace srcTbl with srcView.

> CREATE VIEW followup:  track view dependency information in metastore
> -
>
> Key: HIVE-1073
> URL: https://issues.apache.org/jira/browse/HIVE-1073
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Views
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Wenlei Xie
>
> Add a generic mechanism for recording the fact that one object depends on 
> another.  First use case (to be implemented as part of this task) would be 
> views depending on tables or other views, but in the future we can also use 
> this for views depending on persistent functions, functions depending on 
> other functions, etc.
> This involves metastore modeling, QL analysis for deriving and recording the 
> dependencies (Ashish says something may already be available from the lineage 
> work), and CLI support for browsing dependencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11981) ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2015-11-19 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013146#comment-15013146
 ] 

Carl Steinbach commented on HIVE-11981:
---

bq. This adds hive.exec.schema.evolution to HiveConf.java ... Although it seems 
to be a general parameter, this JIRA issue is ORC-specific...

If this property is ORC specific then it seems like a mistake to name it 
hive.exec.orc.schema.evolution. Is there a good reason why "orc" doesn't appear 
in the property name or property description?

> ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
> --
>
> Key: HIVE-11981
> URL: https://issues.apache.org/jira/browse/HIVE-11981
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11981.01.patch, HIVE-11981.02.patch, 
> HIVE-11981.03.patch, HIVE-11981.05.patch, HIVE-11981.06.patch, 
> HIVE-11981.07.patch, HIVE-11981.08.patch, HIVE-11981.09.patch, 
> HIVE-11981.091.patch, HIVE-11981.092.patch, HIVE-11981.093.patch, 
> HIVE-11981.094.patch, HIVE-11981.095.patch, HIVE-11981.096.patch, 
> HIVE-11981.097.patch, HIVE-11981.098.patch, HIVE-11981.099.patch, 
> HIVE-11981.0991.patch, HIVE-11981.0992.patch, ORC Schema Evolution Issues.docx
>
>
> High priority issues with schema evolution for the ORC file format.
> Schema evolution here is limited to adding new columns and a few cases of 
> column type-widening (e.g. int to bigint).
> Renaming columns, deleting column, moving columns and other schema evolution 
> were not pursued due to lack of importance and lack of time.  Also, it 
> appears a much more sophisticated metadata would be needed to support them.
> The biggest issues for users have been adding new columns for ACID table 
> (HIVE-11421 Support Schema evolution for ACID tables) and vectorization 
> (HIVE-10598 Vectorization borks when column is added to table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12285) Add locking to HCatClient

2015-11-11 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-12285:
--
Assignee: Elliot West  (was: Carl Steinbach)

> Add locking to HCatClient
> -
>
> Key: HIVE-12285
> URL: https://issues.apache.org/jira/browse/HIVE-12285
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 2.0.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: concurrency, hcatalog, lock, locking, locks
>
> With the introduction of a concurrency model (HIVE-1293) Hive uses locks to 
> coordinate  access and updates to both table data and metadata. Within the 
> Hive CLI such lock management is seamless. However, Hive provides additional 
> APIs that permit interaction with data repositories, namely the HCatalog 
> APIs. Currently, operations implemented by this API do not participate with 
> Hive's locking scheme. Furthermore, access to the locking mechanisms is not 
> exposed by the APIs (as is the case with the Metastore Thrift API) and so 
> users are not able to explicitly interact with locks either. This has created 
> a less than ideal situation where users of the APIs have no choice but to 
> manipulate these data repositories outside of the command of Hive's lock 
> management, potentially resulting in situations where data inconsistencies 
> can occur both for external processes using the API and for queries executing 
> within Hive.
> h3. Scope of work
> This ticket is concerned with sections of the HCatalog API that deal with DDL 
> type operations using the metastore, not with those whose purpose is to 
> read/write table data. A separate issue already exists for adding locking to 
> HCat readers and writers (HIVE-6207).
> h3. Proposed work
> The following work items would serve as a minimum deliverable that would both 
> allow API users to effectively work with locks:
> * Comprehensively document on the wiki the locks required for various Hive 
> operations. At a minimum this should cover all operations exposed by 
> {{HCatClient}}. The [Locking design 
> document|https://cwiki.apache.org/confluence/display/Hive/Locking] can be 
> used as a starting point or perhaps updated.
> * Implement methods and types in the {{HCatClient}} API that allow users to 
> manipulate Hive locks. For the most part I'd expect these to delegate to the 
> metastore API implementations:
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)}}
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.checkLock(long)}}
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)}}
> ** -{{org.apache.hadoop.hive.metastore.IMetaStoreClient.showLocks()}}-
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.heartbeat(long, long)}}
> ** {{org.apache.hadoop.hive.metastore.api.LockComponent}}
> ** {{org.apache.hadoop.hive.metastore.api.LockRequest}}
> ** {{org.apache.hadoop.hive.metastore.api.LockResponse}}
> ** {{org.apache.hadoop.hive.metastore.api.LockLevel}}
> ** {{org.apache.hadoop.hive.metastore.api.LockType}}
> ** {{org.apache.hadoop.hive.metastore.api.LockState}}
> ** -{{org.apache.hadoop.hive.metastore.api.ShowLocksResponse}}-
> h3. Additional proposals
> Explicit lock management should be fairly simple to add to {{HCatClient}}, 
> however it puts the onus on the API user to correctly understand and 
> implement code that uses lock in an appropriate manner. Failure to do so may 
> have undesirable consequences. With a simpler user model the operations 
> exposed on the API would automatically acquire and release the locks that 
> they need. This might work well for small numbers of operations, but not 
> perhaps for large sequences of invocations. (Do we need to worry about this 
> though as the API methods usually accept batches?).  Additionally tasks such 
> as heartbeat management could also be handled implicitly for long running 
> sets of operations. With these concerns in mind it may also be beneficial to 
> deliver some of the following:
> * A means to automatically acquire/release appropriate locks for 
> {{HCatClient}} operations.
> * A component that maintains a lock heartbeat from the client.
> * A strategy for switching between manual/automatic lock management, 
> analogous to SQL's {{autocommit}} for transactions.
> An API for lock and heartbeat management already exists in the HCatalog 
> Mutation API (see: 
> {{org.apache.hive.hcatalog.streaming.mutate.client.lock}}). It will likely 
> make sense to refactor either this code and/or code that uses it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12285) Add locking to HCatClient

2015-11-11 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000199#comment-15000199
 ] 

Carl Steinbach commented on HIVE-12285:
---

Hi [~teabot], I accidentally assigned the ticket to myself. Sorry for any 
confusion this caused.

> Add locking to HCatClient
> -
>
> Key: HIVE-12285
> URL: https://issues.apache.org/jira/browse/HIVE-12285
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 2.0.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: concurrency, hcatalog, lock, locking, locks
>
> With the introduction of a concurrency model (HIVE-1293) Hive uses locks to 
> coordinate  access and updates to both table data and metadata. Within the 
> Hive CLI such lock management is seamless. However, Hive provides additional 
> APIs that permit interaction with data repositories, namely the HCatalog 
> APIs. Currently, operations implemented by this API do not participate with 
> Hive's locking scheme. Furthermore, access to the locking mechanisms is not 
> exposed by the APIs (as is the case with the Metastore Thrift API) and so 
> users are not able to explicitly interact with locks either. This has created 
> a less than ideal situation where users of the APIs have no choice but to 
> manipulate these data repositories outside of the command of Hive's lock 
> management, potentially resulting in situations where data inconsistencies 
> can occur both for external processes using the API and for queries executing 
> within Hive.
> h3. Scope of work
> This ticket is concerned with sections of the HCatalog API that deal with DDL 
> type operations using the metastore, not with those whose purpose is to 
> read/write table data. A separate issue already exists for adding locking to 
> HCat readers and writers (HIVE-6207).
> h3. Proposed work
> The following work items would serve as a minimum deliverable that would both 
> allow API users to effectively work with locks:
> * Comprehensively document on the wiki the locks required for various Hive 
> operations. At a minimum this should cover all operations exposed by 
> {{HCatClient}}. The [Locking design 
> document|https://cwiki.apache.org/confluence/display/Hive/Locking] can be 
> used as a starting point or perhaps updated.
> * Implement methods and types in the {{HCatClient}} API that allow users to 
> manipulate Hive locks. For the most part I'd expect these to delegate to the 
> metastore API implementations:
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)}}
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.checkLock(long)}}
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)}}
> ** -{{org.apache.hadoop.hive.metastore.IMetaStoreClient.showLocks()}}-
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.heartbeat(long, long)}}
> ** {{org.apache.hadoop.hive.metastore.api.LockComponent}}
> ** {{org.apache.hadoop.hive.metastore.api.LockRequest}}
> ** {{org.apache.hadoop.hive.metastore.api.LockResponse}}
> ** {{org.apache.hadoop.hive.metastore.api.LockLevel}}
> ** {{org.apache.hadoop.hive.metastore.api.LockType}}
> ** {{org.apache.hadoop.hive.metastore.api.LockState}}
> ** -{{org.apache.hadoop.hive.metastore.api.ShowLocksResponse}}-
> h3. Additional proposals
> Explicit lock management should be fairly simple to add to {{HCatClient}}, 
> however it puts the onus on the API user to correctly understand and 
> implement code that uses lock in an appropriate manner. Failure to do so may 
> have undesirable consequences. With a simpler user model the operations 
> exposed on the API would automatically acquire and release the locks that 
> they need. This might work well for small numbers of operations, but not 
> perhaps for large sequences of invocations. (Do we need to worry about this 
> though as the API methods usually accept batches?).  Additionally tasks such 
> as heartbeat management could also be handled implicitly for long running 
> sets of operations. With these concerns in mind it may also be beneficial to 
> deliver some of the following:
> * A means to automatically acquire/release appropriate locks for 
> {{HCatClient}} operations.
> * A component that maintains a lock heartbeat from the client.
> * A strategy for switching between manual/automatic lock management, 
> analogous to SQL's {{autocommit}} for transactions.
> An API for lock and heartbeat management already exists in the HCatalog 
> Mutation API (see: 
> {{org.apache.hive.hcatalog.streaming.mutate.client.lock}}). It will likely 
> make sense to refactor either this code and/or code that uses it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12285) Add locking to HCatClient

2015-11-09 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reassigned HIVE-12285:
-

Assignee: Carl Steinbach  (was: Elliot West)

> Add locking to HCatClient
> -
>
> Key: HIVE-12285
> URL: https://issues.apache.org/jira/browse/HIVE-12285
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 2.0.0
>Reporter: Elliot West
>Assignee: Carl Steinbach
>  Labels: concurrency, hcatalog, lock, locking, locks
>
> With the introduction of a concurrency model (HIVE-1293) Hive uses locks to 
> coordinate  access and updates to both table data and metadata. Within the 
> Hive CLI such lock management is seamless. However, Hive provides additional 
> APIs that permit interaction with data repositories, namely the HCatalog 
> APIs. Currently, operations implemented by this API do not participate with 
> Hive's locking scheme. Furthermore, access to the locking mechanisms is not 
> exposed by the APIs (as is the case with the Metastore Thrift API) and so 
> users are not able to explicitly interact with locks either. This has created 
> a less than ideal situation where users of the APIs have no choice but to 
> manipulate these data repositories outside of the command of Hive's lock 
> management, potentially resulting in situations where data inconsistencies 
> can occur both for external processes using the API and for queries executing 
> within Hive.
> h3. Scope of work
> This ticket is concerned with sections of the HCatalog API that deal with DDL 
> type operations using the metastore, not with those whose purpose is to 
> read/write table data. A separate issue already exists for adding locking to 
> HCat readers and writers (HIVE-6207).
> h3. Proposed work
> The following work items would serve as a minimum deliverable that would both 
> allow API users to effectively work with locks:
> * Comprehensively document on the wiki the locks required for various Hive 
> operations. At a minimum this should cover all operations exposed by 
> {{HCatClient}}. The [Locking design 
> document|https://cwiki.apache.org/confluence/display/Hive/Locking] can be 
> used as a starting point or perhaps updated.
> * Implement methods and types in the {{HCatClient}} API that allow users to 
> manipulate Hive locks. For the most part I'd expect these to delegate to the 
> metastore API implementations:
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)}}
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.checkLock(long)}}
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)}}
> ** -{{org.apache.hadoop.hive.metastore.IMetaStoreClient.showLocks()}}-
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.heartbeat(long, long)}}
> ** {{org.apache.hadoop.hive.metastore.api.LockComponent}}
> ** {{org.apache.hadoop.hive.metastore.api.LockRequest}}
> ** {{org.apache.hadoop.hive.metastore.api.LockResponse}}
> ** {{org.apache.hadoop.hive.metastore.api.LockLevel}}
> ** {{org.apache.hadoop.hive.metastore.api.LockType}}
> ** {{org.apache.hadoop.hive.metastore.api.LockState}}
> ** -{{org.apache.hadoop.hive.metastore.api.ShowLocksResponse}}-
> h3. Additional proposals
> Explicit lock management should be fairly simple to add to {{HCatClient}}, 
> however it puts the onus on the API user to correctly understand and 
> implement code that uses lock in an appropriate manner. Failure to do so may 
> have undesirable consequences. With a simpler user model the operations 
> exposed on the API would automatically acquire and release the locks that 
> they need. This might work well for small numbers of operations, but not 
> perhaps for large sequences of invocations. (Do we need to worry about this 
> though as the API methods usually accept batches?).  Additionally tasks such 
> as heartbeat management could also be handled implicitly for long running 
> sets of operations. With these concerns in mind it may also be beneficial to 
> deliver some of the following:
> * A means to automatically acquire/release appropriate locks for 
> {{HCatClient}} operations.
> * A component that maintains a lock heartbeat from the client.
> * A strategy for switching between manual/automatic lock management, 
> analogous to SQL's {{autocommit}} for transactions.
> An API for lock and heartbeat management already exists in the HCatalog 
> Mutation API (see: 
> {{org.apache.hive.hcatalog.streaming.mutate.client.lock}}). It will likely 
> make sense to refactor either this code and/or code that uses it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6952) Hive 0.13 HiveOutputFormat breaks backwards compatibility

2015-11-05 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-6952:
-
Labels: backward-incompatible  (was: )

> Hive 0.13 HiveOutputFormat breaks backwards compatibility
> -
>
> Key: HIVE-6952
> URL: https://issues.apache.org/jira/browse/HIVE-6952
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Costin Leau
>Assignee: Ashutosh Chauhan
>Priority: Blocker
>  Labels: backward-incompatible
> Fix For: 0.14.0, 0.13.1
>
> Attachments: HIVE-6952.patch, HIVE-6952_branch-13.patch
>
>
> Hive 0.13 changed the signature of HiveOutputFormat (through commit r1527149) 
> breaking backwards compatibility with previous releases; the return type of 
> getHiveRecordWriter has been changed from RecordWriter to FSRecordWriter.
> FSRecordWriter introduces one new method on top of RecordWriter however it 
> does not extend the previous interface and it lives in a completely new 
> package.
> Thus code running fine on Hive 0.12 breaks on Hive 0.13. After the upgrade, 
> code running on HIve 0.13, will break on anything lower than this.
> This could have easily been avoided by extending the existing interface or 
> introducing a new one that RecordWriter could have extended going forward. By 
> changing the signature, the existing contract (and compatibility) has been 
> voided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7770) Undo backward-incompatible behaviour change introduced by HIVE-7341

2015-11-05 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-7770:
-
Labels: backward-incompatible regression  (was: regression)

> Undo backward-incompatible behaviour change introduced by HIVE-7341
> ---
>
> Key: HIVE-7770
> URL: https://issues.apache.org/jira/browse/HIVE-7770
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0
>Reporter: Sushanth Sowmyan
>Assignee: Mithun Radhakrishnan
>  Labels: backward-incompatible, regression
> Fix For: 0.14.0
>
> Attachments: HIVE-7770.1.patch
>
>
> HIVE-7341 introduced a backward-incompatibility regression in Exception 
> signatures for HCatPartition.getColumns() that breaks compilation for 
> external tools like Falcon. This bug tracks a scrub of any other issues we 
> discover, so we can put them back to how it used to be. This bug needs 
> resolution in the same release as HIVE-7341, and thus, must be resolved in 
> 0.14.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances

2015-11-05 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-7341:
-
Labels: backward-incompatible data-replication  (was: backward-incompatible)

> Support for Table replication across HCatalog instances
> ---
>
> Key: HIVE-7341
> URL: https://issues.apache.org/jira/browse/HIVE-7341
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 0.13.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>  Labels: backward-incompatible, data-replication
> Fix For: 0.14.0
>
> Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch, HIVE-7341.3.patch, 
> HIVE-7341.4.patch, HIVE-7341.5.patch
>
>
> The HCatClient currently doesn't provide very much support for replicating 
> HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) 
> instances. 
> Systems similar to Apache Falcon might find the need to replicate partition 
> data between 2 clusters, and keep the HCatalog metadata in sync between the 
> two. This poses a couple of problems:
> # The definition of the source table might change (in column schema, I/O 
> formats, record-formats, serde-parameters, etc.) The system will need a way 
> to diff 2 tables and update the target-metastore with the changes. E.g. 
> {code}
> targetTable.resolve( sourceTable, targetTable.diff(sourceTable) );
> hcatClient.updateTableSchema(dbName, tableName, targetTable);
> {code}
> # The current {{HCatClient.addPartitions()}} API requires that the 
> partition's schema be derived from the table's schema, thereby requiring that 
> the table-schema be resolved *before* partitions with the new schema are 
> added to the table. This is problematic, because it introduces race 
> conditions when 2 partitions with differing column-schemas (e.g. right after 
> a schema change) are copied in parallel. This can be avoided if each 
> HCatAddPartitionDesc kept track of the partition's schema, in flight.
> # The source and target metastores might be running different/incompatible 
> versions of Hive. 
> The impending patch attempts to address these concerns (with some caveats).
> # {{HCatTable}} now has 
> ## a {{diff()}} method, to compare against another HCatTable instance
> ## a {{resolve(diff)}} method to copy over specified table-attributes from 
> another HCatTable
> ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and 
> {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed 
> in other class-loaders may be used for comparison
> # {{HCatPartition}} now provides finer-grained control over a Partition's 
> column-schema, StorageDescriptor settings, etc. This allows partitions to be 
> copied completely from source, with the ability to override specific 
> properties if required (e.g. location).
> # {{HCatClient.updateTableSchema()}} can now update the entire 
> table-definition, not just the column schema.
> # I've cleaned up and removed most of the redundancy between the HCatTable, 
> HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to 
> separate the table-attributes from the add-table-operation's attributes. By 
> providing fluent-interfaces in HCatTable, and composing an HCatTable instance 
> in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are 
> deprecated, in favour of those in HCatTable. Likewise, HCatPartition and 
> HCatAddPartitionDesc.
> I'll post a patch for trunk shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6952) Hive 0.13 HiveOutputFormat breaks backwards compatibility

2015-11-05 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992879#comment-14992879
 ] 

Carl Steinbach commented on HIVE-6952:
--

Linking HIVE-5324 which added the backward incompatible changes.

> Hive 0.13 HiveOutputFormat breaks backwards compatibility
> -
>
> Key: HIVE-6952
> URL: https://issues.apache.org/jira/browse/HIVE-6952
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Costin Leau
>Assignee: Ashutosh Chauhan
>Priority: Blocker
>  Labels: backward-incompatible
> Fix For: 0.14.0, 0.13.1
>
> Attachments: HIVE-6952.patch, HIVE-6952_branch-13.patch
>
>
> Hive 0.13 changed the signature of HiveOutputFormat (through commit r1527149) 
> breaking backwards compatibility with previous releases; the return type of 
> getHiveRecordWriter has been changed from RecordWriter to FSRecordWriter.
> FSRecordWriter introduces one new method on top of RecordWriter however it 
> does not extend the previous interface and it lives in a completely new 
> package.
> Thus code running fine on Hive 0.12 breaks on Hive 0.13. After the upgrade, 
> code running on HIve 0.13, will break on anything lower than this.
> This could have easily been avoided by extending the existing interface or 
> introducing a new one that RecordWriter could have extended going forward. By 
> changing the signature, the existing contract (and compatibility) has been 
> voided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics

2015-11-05 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-5324:
-
Labels: backward-incompatible orcfile statistics  (was: orcfile statistics)

> Extend record writer and ORC reader/writer interfaces to provide statistics
> ---
>
> Key: HIVE-5324
> URL: https://issues.apache.org/jira/browse/HIVE-5324
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 0.13.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: backward-incompatible, orcfile, statistics
> Fix For: 0.13.0
>
> Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, 
> HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt
>
>
> The current implementation for computing statistics (number of rows and raw 
> data size) happens for every single row processed. The processOp() method in 
> FileSinkOperator gets raw data size for each row from the serde and 
> accumulates the size in hashmap while counting the number of rows. This 
> accumulated statistics is then published to metastore. 
> In case of ORC, ORC already stores enough statistics internally which can be 
> made use of when publishing the stats to metastore. This will avoid the 
> duplication of work that is happening in the processOp(). Also getting the 
> statistics directly from ORC is very cheap (can directly read from the file 
> footer).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics

2015-11-05 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-5324:
-
Component/s: Serializers/Deserializers
 ORC

> Extend record writer and ORC reader/writer interfaces to provide statistics
> ---
>
> Key: HIVE-5324
> URL: https://issues.apache.org/jira/browse/HIVE-5324
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC, Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: backward-incompatible, orcfile, statistics
> Fix For: 0.13.0
>
> Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, 
> HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt
>
>
> The current implementation for computing statistics (number of rows and raw 
> data size) happens for every single row processed. The processOp() method in 
> FileSinkOperator gets raw data size for each row from the serde and 
> accumulates the size in hashmap while counting the number of rows. This 
> accumulated statistics is then published to metastore. 
> In case of ORC, ORC already stores enough statistics internally which can be 
> made use of when publishing the stats to metastore. This will avoid the 
> duplication of work that is happening in the processOp(). Also getting the 
> statistics directly from ORC is very cheap (can directly read from the file 
> footer).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11288) Avro SerDe InstanceCache returns incorrect schema

2015-11-02 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-11288:
--
Labels: Avro AvroSerde  (was: )

> Avro SerDe InstanceCache returns incorrect schema
> -
>
> Key: HIVE-11288
> URL: https://issues.apache.org/jira/browse/HIVE-11288
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>  Labels: Avro, AvroSerde
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11288.2.patch, HIVE-11288.3.patch, 
> HIVE-11288.4.patch, HIVE-11288.patch
>
>
> To reproduce this error, take two fields in an avro schema document matching 
> the following:
> "type" :  { "type": "array", "items": [ "null",  { "type": "map", "values": [ 
> "null", "string" ] } ]  }
> "type" : { "type": "map", "values": [ "null" , { "type": "array", "items": [ 
> "null" , "string"] } ] }
> After creating two tables in hive with these schemas, the describe statement 
> on each of them will only return the schema for the first one loaded.  This 
> is due to a hashCode() collision in the InstanceCache.  
> A patch will be included in this ticket shortly which removes the hashCode 
> call from the InstanceCache's internal HashMap, and instead provides the 
> entire schema object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11288) Avro SerDe InstanceCache returns incorrect schema

2015-11-02 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-11288:
--
Component/s: Serializers/Deserializers

> Avro SerDe InstanceCache returns incorrect schema
> -
>
> Key: HIVE-11288
> URL: https://issues.apache.org/jira/browse/HIVE-11288
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Greg Phillips
>Assignee: Greg Phillips
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11288.2.patch, HIVE-11288.3.patch, 
> HIVE-11288.4.patch, HIVE-11288.patch
>
>
> To reproduce this error, take two fields in an avro schema document matching 
> the following:
> "type" :  { "type": "array", "items": [ "null",  { "type": "map", "values": [ 
> "null", "string" ] } ]  }
> "type" : { "type": "map", "values": [ "null" , { "type": "array", "items": [ 
> "null" , "string"] } ] }
> After creating two tables in hive with these schemas, the describe statement 
> on each of them will only return the schema for the first one loaded.  This 
> is due to a hashCode() collision in the InstanceCache.  
> A patch will be included in this ticket shortly which removes the hashCode 
> call from the InstanceCache's internal HashMap, and instead provides the 
> entire schema object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4117) Extract schema from avro files when creating external hive table on existing avro file/dir

2015-11-01 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-4117:
-
Labels: Avro AvroSerde patch  (was: patch)

> Extract schema from avro files when creating external hive table on existing 
> avro file/dir
> --
>
> Key: HIVE-4117
> URL: https://issues.apache.org/jira/browse/HIVE-4117
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Shashwat Agarwal
>Priority: Minor
>  Labels: Avro, AvroSerde, patch
> Attachments: avro-read-schema.patch
>
>
> We can extract schema from Avro file itself when creating an external table 
> over existing avro files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)

2015-10-30 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-11055:
--
Component/s: hpl/sql

> HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
> ---
>
> Key: HIVE-11055
> URL: https://issues.apache.org/jira/browse/HIVE-11055
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Dmitry Tolpeko
>Assignee: Dmitry Tolpeko
> Fix For: 2.0.0
>
> Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
> HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml
>
>
> There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
> (actually any SQL-on-Hadoop implementation and any JDBC source).
> Alan Gates offered to contribute it to Hive under HPL/SQL name 
> (org.apache.hive.hplsql package). This JIRA is to create a patch to 
> contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11802) Float-point numbers are displayed with different precision in Beeline/JDBC

2015-10-09 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951535#comment-14951535
 ] 

Carl Steinbach commented on HIVE-11802:
---

I think this is a case where correctness is more important than performance. 
Let's aim first for the former, and once it's achieved we can worry about the 
latter.

> Float-point numbers are displayed with different precision in Beeline/JDBC
> --
>
> Key: HIVE-11802
> URL: https://issues.apache.org/jira/browse/HIVE-11802
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 2.0.0
>
> Attachments: HIVE-11802.3.patch
>
>
> When inserting float-point numbers to a table, the values displayed on 
> beeline or jdbc are with different precision.
> How to reproduce:
> {noformat}
> 0: jdbc:hive2://localhost:1> create table decimals (f float, af 
> array, d double, ad array) stored as parquet;
> No rows affected (0.294 seconds)
> 0: jdbc:hive2://localhost:1> insert into table decimals select 1.10058, 
> array(cast(1.10058 as float)), 2.0133, array(2.0133) from dummy limit 1;
> ...
> No rows affected (20.089 seconds)
> 0: jdbc:hive2://localhost:1> select f, af, af[0], d, ad[0] from decimals;
> +-++-+-+-+--+
> |  f  | af | _c2 |d|   _c4   |
> +-++-+-+-+--+
> | 1.1005799770355225  | [1.10058]  | 1.1005799770355225  | 2.0133  | 2.0133  |
> +-++-+-+-+--+
> {noformat}
> When displaying arrays, the values are displayed correctly, but if I print a 
> specific element, it is then displayed with more decimal positions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive

2015-09-23 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905124#comment-14905124
 ] 

Carl Steinbach commented on HIVE-11878:
---

Hi [~rdsr], I left some comments on RB related to the testing approach. 
Everything else looks good. Thanks.

> ClassNotFoundException can possibly  occur if multiple jars are registered 
> one at a time in Hive
> 
>
> Key: HIVE-11878
> URL: https://issues.apache.org/jira/browse/HIVE-11878
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>  Labels: URLClassLoader
> Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, 
> HIVE-11878_qtest.patch
>
>
> When we register a jar on the Hive console. Hive creates a fresh URL 
> classloader which includes the path of the current jar to be registered and 
> all the jar paths of the parent classloader. The parent classlaoder is the 
> current ThreadContextClassLoader. Once the URLClassloader is created Hive 
> sets that as the current ThreadContextClassloader.
> So if we register multiple jars in Hive, there will be multiple 
> URLClassLoaders created, each classloader including the jars from its parent 
> and the one extra jar to be registered. The last URLClassLoader created will 
> end up as the current ThreadContextClassLoader. (See details: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)
> Now here's an example in which the above strategy can lead to a CNF exception.
> We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class 
> *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, 
> the URLClassLoader *u1* is created and also set as the 
> ThreadContextClassLoader. We register *j2* next, the new URLClassLoader 
> created will be *u2* with *u1* as parent and *u2* becomes the new 
> ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* 
> whereas *u1* only has paths to *j1* (For details see: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).
> Now when we register class *c1* under a temporary function in Hive, we load 
> the class using {code} class.forName("c1", true, 
> Thread.currentThread().getContextClassLoader()) {code} . The 
> currentThreadContext class-loader is *u2*, and it has the path to the class 
> *c1*, but note that Class-loaders work by delegating to parent class-loader 
> first. In this case class *c1* will be found and *defined* by class-loader 
> *u1*.
> Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say 
> initialize) is called in *c1*, which references the class *c2*, *c2* will not 
> be found since the class-loader used to search for *c2* will be *u1* (Since 
> the caller's class-loader is used to load a class)
> I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-2210) ALTER VIEW RENAME

2015-09-18 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2210:
-
Component/s: Views

> ALTER VIEW RENAME
> -
>
> Key: HIVE-2210
> URL: https://issues.apache.org/jira/browse/HIVE-2210
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor, Views
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Charles Chen
> Fix For: 0.8.0
>
> Attachments: HIVE-2210v0.patch, HIVE-2210v1.patch
>
>
> ALTER TABLE RENAME cannot be used on a view; we should support ALTER VIEW 
> RENAME.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11802) Float-point numbers are displayed with different precision in Beeline/JDBC

2015-09-14 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744787#comment-14744787
 ] 

Carl Steinbach commented on HIVE-11802:
---

+1. Can you commit this yourself?

> Float-point numbers are displayed with different precision in Beeline/JDBC
> --
>
> Key: HIVE-11802
> URL: https://issues.apache.org/jira/browse/HIVE-11802
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-11802.3.patch
>
>
> When inserting float-point numbers to a table, the values displayed on 
> beeline or jdbc are with different precision.
> How to reproduce:
> {noformat}
> 0: jdbc:hive2://localhost:1> create table decimals (f float, af 
> array, d double, ad array) stored as parquet;
> No rows affected (0.294 seconds)
> 0: jdbc:hive2://localhost:1> insert into table decimals select 1.10058, 
> array(cast(1.10058 as float)), 2.0133, array(2.0133) from dummy limit 1;
> ...
> No rows affected (20.089 seconds)
> 0: jdbc:hive2://localhost:1> select f, af, af[0], d, ad[0] from decimals;
> +-++-+-+-+--+
> |  f  | af | _c2 |d|   _c4   |
> +-++-+-+-+--+
> | 1.1005799770355225  | [1.10058]  | 1.1005799770355225  | 2.0133  | 2.0133  |
> +-++-+-+-+--+
> {noformat}
> When displaying arrays, the values are displayed correctly, but if I print a 
> specific element, it is then displayed with more decimal positions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-2655) Ability to define functions in HQL

2015-09-14 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743007#comment-14743007
 ] 

Carl Steinbach commented on HIVE-2655:
--

bq. Marking resolved as this was committed in 
54ec1cb0d0540edf7946738bc113e90adcc09a6d.

Here's the actual commit post migration to Git:

{noformat}
commit 7f0d6e69ec7aae756d1e7cee034df60b744482ce
Author: Edward Capriolo 
Date:   Sat Jun 15 00:59:04 2013 +

Submitted by: Brock Noland Jonathon Chang
Reviewed by: Edward Capriolo
Approved by: Edward Capriolo


git-svn-id: https://svn.apache.org/repos/asf/hive/trunk@1493292 
13f79535-47bb-0310-9956-ffa450edef68

 ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeEvaluator.java| 
  7 ++
 ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java | 
 11 +++
 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java | 
 35 ++
 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java | 
 32 +
 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java| 
 31 +
 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g  | 
  1 +
 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g | 
 19 ++
 ql/src/java/org/apache/hadoop/hive/ql/parse/MacroSemanticAnalyzer.java   | 
146 +++
 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java| 
  2 +-
 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java | 
  5 ++
 ql/src/java/org/apache/hadoop/hive/ql/plan/CreateMacroDesc.java  | 
 72 
 ql/src/java/org/apache/hadoop/hive/ql/plan/DropMacroDesc.java| 
 48 +
 ql/src/java/org/apache/hadoop/hive/ql/plan/FunctionWork.java | 
 18 +
 ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java| 
  2 +
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMacro.java   | 
171 ++
 ql/src/test/org/apache/hadoop/hive/ql/parse/TestMacroSemanticAnalyzer.java   | 
133 
 ql/src/test/org/apache/hadoop/hive/ql/parse/TestSemanticAnalyzerFactory.java | 
 47 +
 ql/src/test/org/apache/hadoop/hive/ql/plan/TestCreateMacroDesc.java  | 
 54 +++
 ql/src/test/org/apache/hadoop/hive/ql/plan/TestDropMacroDesc.java| 
 36 ++
 ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFMacro.java   | 
 96 ++
 ql/src/test/queries/clientnegative/macro_unused_parameter.q  | 
  1 +
 ql/src/test/queries/clientpositive/macro.q   | 
 26 +++
 ql/src/test/queries/negative/macro_reserved_word.q   | 
  1 +
 ql/src/test/results/clientnegative/macro_unused_parameter.q.out  | 
  1 +
 ql/src/test/results/clientpositive/macro.q.out   | 
472 
++
 ql/src/test/results/compiler/errors/macro_reserved_word.q.out| 
  1 +
 26 files changed, 1453 insertions(+), 15 deletions(-)
{noformat}


> Ability to define functions in HQL
> --
>
> Key: HIVE-2655
> URL: https://issues.apache.org/jira/browse/HIVE-2655
> Project: Hive
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Jonathan Perlow
>Assignee: Brock Noland
>  Labels: TODOC12
> Fix For: 0.12.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.4.patch, HIVE-2655-10.patch, 
> HIVE-2655-10.patch, HIVE-2655-9.patch
>
>
> Ability to create functions in HQL as a substitute for creating them in Java.
> Jonathan Chang requested I create this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-2655) Ability to define functions in HQL

2015-09-14 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2655:
-
Component/s: Macros

> Ability to define functions in HQL
> --
>
> Key: HIVE-2655
> URL: https://issues.apache.org/jira/browse/HIVE-2655
> Project: Hive
>  Issue Type: New Feature
>  Components: Macros, SQL
>Reporter: Jonathan Perlow
>Assignee: Brock Noland
>  Labels: TODOC12
> Fix For: 0.12.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.4.patch, HIVE-2655-10.patch, 
> HIVE-2655-10.patch, HIVE-2655-9.patch
>
>
> Ability to create functions in HQL as a substitute for creating them in Java.
> Jonathan Chang requested I create this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11802) Float-point numbers are displayed with different precision in Beeline/JDBC

2015-09-13 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742769#comment-14742769
 ] 

Carl Steinbach commented on HIVE-11802:
---

Thanks for adding a test. Here's some feedback:
* TestColumn is missing an ASF header.
* TestColumn doesn't prove that beeline returns correct results. What we need 
is an end-to-end test that validates the output of beeline. There's an existing 
test driver (TestBeeLineDriver) that was included in the original HiveServer2 
patch. The goal was to make it easy to write end-to-end Beeline tests in the 
style of the existing qfile tests. There's also a set of sample data files in 
files/types/primitives that cover all primitive types, and an initialization 
file (data/scripts/q_test_init.sql) that creates a 'primitives' table on top of 
it. I think we'd get more complete and easier to maintain test coverage with 
less code by resurrecting TestBeeLineDriver and writing a new beeline qfile 
test that runs a 'SELECT *' query against the primitives table. I suspect the 
original HS2 patch even had a qfile test for this, but I'm too depressed to 
look. It would be awesome if you want to fix this, but all that really stands 
in the way of a +1 is adding the missing ASF header.



> Float-point numbers are displayed with different precision in Beeline/JDBC
> --
>
> Key: HIVE-11802
> URL: https://issues.apache.org/jira/browse/HIVE-11802
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-11802.2.patch
>
>
> When inserting float-point numbers to a table, the values displayed on 
> beeline or jdbc are with different precision.
> How to reproduce:
> {noformat}
> 0: jdbc:hive2://localhost:1> create table decimals (f float, af 
> array, d double, ad array) stored as parquet;
> No rows affected (0.294 seconds)
> 0: jdbc:hive2://localhost:1> insert into table decimals select 1.10058, 
> array(cast(1.10058 as float)), 2.0133, array(2.0133) from dummy limit 1;
> ...
> No rows affected (20.089 seconds)
> 0: jdbc:hive2://localhost:1> select f, af, af[0], d, ad[0] from decimals;
> +-++-+-+-+--+
> |  f  | af | _c2 |d|   _c4   |
> +-++-+-+-+--+
> | 1.1005799770355225  | [1.10058]  | 1.1005799770355225  | 2.0133  | 2.0133  |
> +-++-+-+-+--+
> {noformat}
> When displaying arrays, the values are displayed correctly, but if I print a 
> specific element, it is then displayed with more decimal positions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11802) Float-point numbers are displayed with different precision in Beeline/JDBC

2015-09-11 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741770#comment-14741770
 ] 

Carl Steinbach commented on HIVE-11802:
---

I guess the fact that the patch doesn't include any test updates means that 
there's no test coverage for this right now? If that's the case can you add 
some?

Also, does the HiveCLI behave correctly in this scenario, and is there any test 
coverage provided for this there?

> Float-point numbers are displayed with different precision in Beeline/JDBC
> --
>
> Key: HIVE-11802
> URL: https://issues.apache.org/jira/browse/HIVE-11802
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-11802.1.patch
>
>
> When inserting float-point numbers to a table, the values displayed on 
> beeline or jdbc are with different precision.
> How to reproduce:
> {noformat}
> 0: jdbc:hive2://localhost:1> create table decimals (f float, af 
> array, d double, ad array) stored as parquet;
> No rows affected (0.294 seconds)
> 0: jdbc:hive2://localhost:1> insert into table decimals select 1.10058, 
> array(cast(1.10058 as float)), 2.0133, array(2.0133) from dummy limit 1;
> ...
> No rows affected (20.089 seconds)
> 0: jdbc:hive2://localhost:1> select f, af, af[0], d, ad[0] from decimals;
> +-++-+-+-+--+
> |  f  | af | _c2 |d|   _c4   |
> +-++-+-+-+--+
> | 1.1005799770355225  | [1.10058]  | 1.1005799770355225  | 2.0133  | 2.0133  |
> +-++-+-+-+--+
> {noformat}
> When displaying arrays, the values are displayed correctly, but if I print a 
> specific element, it is then displayed with more decimal positions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10459) Add materialized views to Hive

2015-08-30 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-10459:
--
Component/s: Views

 Add materialized views to Hive
 --

 Key: HIVE-10459
 URL: https://issues.apache.org/jira/browse/HIVE-10459
 Project: Hive
  Issue Type: Improvement
  Components: Views
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-10459.2.patch, HIVE-10459.patch


 Materialized views are useful as ways to store either alternate versions of 
 data (e.g. same data, different sort order) or derivatives of data sets (e.g. 
 commonly used aggregates).  It is useful to store these as materialized views 
 rather than as tables because it can give the optimizer the ability to 
 understand how data sets are related.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11622) Creating an Avro table with a complex map-typed column leads to incorrect column type.

2015-08-22 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-11622:
--
Labels: AvroSerde  (was: )

 Creating an Avro table with a complex map-typed column leads to incorrect 
 column type.
 --

 Key: HIVE-11622
 URL: https://issues.apache.org/jira/browse/HIVE-11622
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 1.1.0
Reporter: Alexander Behm
  Labels: AvroSerde

 In the following CREATE TABLE the following map-typed column leads to the 
 wrong type. I suspect some problem with inferring the Avro schema from the 
 column definitions, but I am not sure.
 Reproduction:
 {code}
 hive create table t (c mapstring,arrayint) stored as avro;
 OK
 Time taken: 0.101 seconds
 hive desc t;
 OK
 c arraymapstring,int  from deserializer   
 Time taken: 0.135 seconds, Fetched: 1 row(s)
 {code}
 Note how the type shown in DESCRIBE is not the type originally passed in the 
 CREATE TABLE.
 However, *sometimes* the DESCRIBE shows the correct output. You may also try 
 these steps which produce a similar problem to increase the chance of hitting 
 this issue:
 {code}
 hive create table t (c arraymapstring,int) stored as avro;
 OK
 Time taken: 0.063 seconds
 hive desc t;
 OK
 c mapstring,arrayint  from deserializer   
 Time taken: 0.152 seconds, Fetched: 1 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7214) Support predicate pushdown for complex data types in ORCFile

2015-08-10 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-7214:
-
Labels: ORC  (was: )

 Support predicate pushdown for complex data types in ORCFile
 

 Key: HIVE-7214
 URL: https://issues.apache.org/jira/browse/HIVE-7214
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Rohini Palaniswamy
  Labels: ORC

 Currently ORCFile does not support predicate pushdown for complex datatypes 
 like map, array and struct while Parquet does. Came across this during 
 discussion of PIG-3760. Our users have a lot of map and struct (tuple in pig) 
 columns and most of the filter conditions are on them. Would be great to have 
 support added for them in ORC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7214) Support predicate pushdown for complex data types in ORCFile

2015-08-10 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-7214:
-
Component/s: File Formats

 Support predicate pushdown for complex data types in ORCFile
 

 Key: HIVE-7214
 URL: https://issues.apache.org/jira/browse/HIVE-7214
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Rohini Palaniswamy

 Currently ORCFile does not support predicate pushdown for complex datatypes 
 like map, array and struct while Parquet does. Came across this during 
 discussion of PIG-3760. Our users have a lot of map and struct (tuple in pig) 
 columns and most of the filter conditions are on them. Would be great to have 
 support added for them in ORC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4239) Remove lock on compilation stage

2015-07-30 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647266#comment-14647266
 ] 

Carl Steinbach commented on HIVE-4239:
--

It should probably go in both the hs2 and compiler sections.



 Remove lock on compilation stage
 

 Key: HIVE-4239
 URL: https://issues.apache.org/jira/browse/HIVE-4239
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Reporter: Carl Steinbach
Assignee: Sergey Shelukhin
  Labels: TODOC2.0
 Fix For: 2.0.0

 Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
 HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
 HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11402) HS2 - disallow parallel query execution within a single Session

2015-07-29 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-11402:
--
Component/s: HiveServer2

 HS2 - disallow parallel query execution within a single Session
 ---

 Key: HIVE-11402
 URL: https://issues.apache.org/jira/browse/HIVE-11402
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair

 HiveServer2 currently allows concurrent queries to be run in a single 
 session. However, every HS2 session has  an associated SessionState object, 
 and the use of SessionState in many places assumes that only one thread is 
 using it, ie it is not thread safe.
 There are many places where SesssionState thread safety needs to be 
 addressed, and until then we should serialize all query execution for a 
 single HS2 session. -This problem can become more visible with HIVE-4239 now 
 allowing parallel query compilation.-
 Note that running queries in parallel for single session is not 
 straightforward  with jdbc, you need to spawn another thread as the 
 Statement.execute calls are blocking. I believe ODBC has non blocking query 
 execution API, and Hue is another well known application that shares sessions 
 for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11402) HS2 - disallow parallel query execution within a single Session

2015-07-29 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646917#comment-14646917
 ] 

Carl Steinbach commented on HIVE-11402:
---

bq. Note that running queries in parallel for single session is not 
straightforward with jdbc, you need to spawn another thread as the 
Statement.execute calls are blocking. I believe ODBC has non blocking query 
execution API ...

I made a couple mistakes when I designed the HS2 API (the use of Thrift Enums 
and Unions comes to mind), but by far the biggest mistake was allowing a 1:many 
relationship between Sessions and Operations. At the time I thought there was a 
chance that the ODBC spec required this, but now think this is something best 
handled on the client side. Providing support for the 1:many Session:Operation 
mapping results in a lot of additional complexity on the server-side, only to 
yield a feature with a very high potential for misuse.

Rather than temporarily serializing operations against a given session, I 
propose instead that we enforce a 1:1 mapping between HS2 sessions and active 
operations. This is a backward incompatible change, but one which I think will 
yield far better results in the long term.

 HS2 - disallow parallel query execution within a single Session
 ---

 Key: HIVE-11402
 URL: https://issues.apache.org/jira/browse/HIVE-11402
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair

 HiveServer2 currently allows concurrent queries to be run in a single 
 session. However, every HS2 session has  an associated SessionState object, 
 and the use of SessionState in many places assumes that only one thread is 
 using it, ie it is not thread safe.
 There are many places where SesssionState thread safety needs to be 
 addressed, and until then we should serialize all query execution for a 
 single HS2 session. -This problem can become more visible with HIVE-4239 now 
 allowing parallel query compilation.-
 Note that running queries in parallel for single session is not 
 straightforward  with jdbc, you need to spawn another thread as the 
 Statement.execute calls are blocking. I believe ODBC has non blocking query 
 execution API, and Hue is another well known application that shares sessions 
 for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-4239) Remove lock on compilation stage

2015-07-29 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646892#comment-14646892
 ] 

Carl Steinbach edited comment on HIVE-4239 at 7/29/15 11:08 PM:


bq. shouldn't it be named hive.server2.driver.parallel.compilation to match the 
other HS2 parameters?

[~leftylev], please see my comment [up 
above|https://issues.apache.org/jira/browse/HIVE-4239?focusedCommentId=14564517page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14564517].



was (Author: cwsteinbach):
bq. shouldn't it be named hive.server2.driver.parallel.compilation to match the 
other HS2 parameters?

[~leftylev], please see my comment here: 
https://issues.apache.org/jira/browse/HIVE-4239?focusedCommentId=14564517page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14564517


 Remove lock on compilation stage
 

 Key: HIVE-4239
 URL: https://issues.apache.org/jira/browse/HIVE-4239
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Reporter: Carl Steinbach
Assignee: Sergey Shelukhin
  Labels: TODOC2.0
 Fix For: 2.0.0

 Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
 HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
 HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >