[jira] [Resolved] (HIVE-9605) Remove parquet nested objects from wrapper writable objects

2015-05-26 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu resolved HIVE-9605.

   Resolution: Fixed
Fix Version/s: (was: parquet-branch)
   1.2.1

Committed to the master. Thanks [~spena]

> Remove parquet nested objects from wrapper writable objects
> ---
>
> Key: HIVE-9605
> URL: https://issues.apache.org/jira/browse/HIVE-9605
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 0.14.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 1.2.1
>
> Attachments: HIVE-9605.3.patch, HIVE-9605.4.patch, HIVE-9605.5.patch, 
> HIVE-9605.6.patch
>
>
> Parquet nested types are using an extra wrapper object (ArrayWritable) as a 
> wrapper of map and list elements. This extra object is not needed and causing 
> unnecessary memory allocations.
> An example of code is on HiveCollectionConverter.java:
> {noformat}
> public void end() {
> parent.set(index, wrapList(new ArrayWritable(
> Writable.class, list.toArray(new Writable[list.size()];
> }
> {noformat}
> This object is later unwrapped on AbstractParquetMapInspector, i.e.:
> {noformat}
> final Writable[] mapContainer = ((ArrayWritable) data).get();
> final Writable[] mapArray = ((ArrayWritable) mapContainer[0]).get();
> for (final Writable obj : mapArray) {
>   ...
> }
> {noformat}
> We should get rid of this wrapper object to save time and memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10684) Fix the unit test failures for HIVE-7553 after HIVE-10674 removed the binary jar files

2015-05-26 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560524#comment-14560524
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-10684:
--

[~Ferd]  Sorry for the delay, I have a few minor comments here.   
1. For public int executeCmd (), can you make this a private function. Also the 
return value from this function is not used; i think it should either be 
removed or have it logged for debugging purpose. 
2. 
{code}
+//  Files.copy(new File("/tmp/" + clazzV2FileName.toString()), dist);
{code}

The above line can be removed.

Thanks
Hari

> Fix the unit test failures for HIVE-7553 after HIVE-10674 removed the binary 
> jar files
> --
>
> Key: HIVE-10684
> URL: https://issues.apache.org/jira/browse/HIVE-10684
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10684.1.patch, HIVE-10684.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10777) LLAP: add pre-fragment and per-table cache details

2015-05-26 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560494#comment-14560494
 ] 

Lefty Leverenz commented on HIVE-10777:
---

Doc note:  This adds *hive.llap.io.orc.time.counters* to HiveConf.java so I'm 
linking to HIVE-9850 for documentation.

> LLAP: add pre-fragment and per-table cache details
> --
>
> Key: HIVE-10777
> URL: https://issues.apache.org/jira/browse/HIVE-10777
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
> Attachments: HIVE-10777.01.patch, HIVE-10777.02.patch, 
> HIVE-10777.WIP.patch, HIVE-10777.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-05-26 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560489#comment-14560489
 ] 

Mostafa Mokhtar commented on HIVE-10704:


Done, command line rb was giving me some headache. 

> Errors in Tez HashTableLoader when estimated table size is 0
> 
>
> Key: HIVE-10704
> URL: https://issues.apache.org/jira/browse/HIVE-10704
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
> HIVE-10704.3.patch
>
>
> Couple of issues:
> - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
> tables, the largest small table selection is wrong and could select the large 
> table (which results in NPE)
> - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
> table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-26 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-10793:
--
Labels: TODOC1.3  (was: )

> Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
> 
>
> Key: HIVE-10793
> URL: https://issues.apache.org/jira/browse/HIVE-10793
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
>  Labels: TODOC1.3
> Fix For: 1.3.0
>
> Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch
>
>
> HybridHashTableContainer will allocate memory based on estimate, which means 
> if the actual is less than the estimate the allocated memory won't be used.
> Number of partitions is calculated based on estimated data size
> {code}
> numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
> minNumParts, minWbSize,
>   nwayConf);
> {code}
> Then based on number of partitions writeBufferSize is set
> {code}
> writeBufferSize = (int)(estimatedTableSize / numPartitions);
> {code}
> Each hash partition will allocate 1 WriteBuffer, with no further allocation 
> if the estimate data size is correct.
> Suggested solution is to reduce writeBufferSize by a factor such that only X% 
> of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-26 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560459#comment-14560459
 ] 

Lefty Leverenz commented on HIVE-10793:
---

Doc note:  This changes the default value of 
*hive.mapjoin.optimized.hashtable.wbsize* so the wiki needs to be updated (with 
version information).

* [Configuration Properties -- hive.mapjoin.optimized.hashtable.wbsize | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapjoin.optimized.hashtable.wbsize]

The patch also makes minor changes to the definitions of 
*hive.mapjoin.hybridgrace.minwbsize* and 
*hive.mapjoin.hybridgrace.minnumpartitions* which do not need any doc changes.

> Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
> 
>
> Key: HIVE-10793
> URL: https://issues.apache.org/jira/browse/HIVE-10793
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.3.0
>
> Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch
>
>
> HybridHashTableContainer will allocate memory based on estimate, which means 
> if the actual is less than the estimate the allocated memory won't be used.
> Number of partitions is calculated based on estimated data size
> {code}
> numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
> minNumParts, minWbSize,
>   nwayConf);
> {code}
> Then based on number of partitions writeBufferSize is set
> {code}
> writeBufferSize = (int)(estimatedTableSize / numPartitions);
> {code}
> Each hash partition will allocate 1 WriteBuffer, with no further allocation 
> if the estimate data size is correct.
> Suggested solution is to reduce writeBufferSize by a factor such that only X% 
> of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10716) Fold case/when udf for expression involving nulls in filter operator.

2015-05-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560457#comment-14560457
 ] 

Ashutosh Chauhan commented on HIVE-10716:
-

[~gopalv] I need to verify, but my guess is 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java#L80
 is coming in play here.

> Fold case/when udf for expression involving nulls in filter operator.
> -
>
> Key: HIVE-10716
> URL: https://issues.apache.org/jira/browse/HIVE-10716
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Affects Versions: 1.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 1.2.1
>
> Attachments: HIVE-10716.patch
>
>
> From HIVE-10636 comments, more folding is possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10716) Fold case/when udf for expression involving nulls in filter operator.

2015-05-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10716:

Affects Version/s: (was: 1.3.0)
   1.2.0

> Fold case/when udf for expression involving nulls in filter operator.
> -
>
> Key: HIVE-10716
> URL: https://issues.apache.org/jira/browse/HIVE-10716
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Affects Versions: 1.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 1.2.1
>
> Attachments: HIVE-10716.patch
>
>
> From HIVE-10636 comments, more folding is possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10819) SearchArgumentImpl for Timestamp is broken by HIVE-10286

2015-05-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560438#comment-14560438
 ] 

Hive QA commented on HIVE-10819:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735439/HIVE-10819.3.patch

{color:red}ERROR:{color} -1 due to 59 failed/errored test(s), 8974 tests 
executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_null_element
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_multi_field_struct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_optional_elements
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_required_elements
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_single_field_struct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_structs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_groups
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_primitives
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_primitives
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_single_field_struct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_create
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_null
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_arrays_of_ints
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_maps
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_nested_complex
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_read_backward_compatible_files
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_schema_evolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_primitives
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_single_field_struct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_crc32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sha1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join30
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAmbiguousSingleFieldGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroPrimitiveInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroSingleFieldGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testHiveRequiredGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testMultiFieldGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewOptionalGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewRequiredGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftPrimitiveInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftSingleFieldGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testUnannotatedListOfGroups
org.apache.hadoop.hive.ql.io.parquet.TestDataWritableWriter.testSimpleType
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testDoubleMapWithStructValue
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testMapWithComplexKey
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testNestedMap
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalArray
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalIntArray
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOptionalPrimitive
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapRequiredPrimitive
org.apache.hadoop.hive.ql.io.parquet.TestParquetSerDe.testParquetHiveSerDe
org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testEmptyContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testNullContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testRegularMap
org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testEmptyContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testNullContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testRegularMap
org.apache.hadoop.hive.ql.i

[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-05-26 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560431#comment-14560431
 ] 

Alexander Pivovarov commented on HIVE-10704:


Mostafa, can you check RB link? I'm not sure it shows HIVE-10704.3.patch

> Errors in Tez HashTableLoader when estimated table size is 0
> 
>
> Key: HIVE-10704
> URL: https://issues.apache.org/jira/browse/HIVE-10704
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
> HIVE-10704.3.patch
>
>
> Couple of issues:
> - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
> tables, the largest small table selection is wrong and could select the large 
> table (which results in NPE)
> - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
> table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-05-26 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560432#comment-14560432
 ] 

Alexander Pivovarov commented on HIVE-10704:


Mostafa, can you check RB link? I'm not sure it shows HIVE-10704.3.patch

> Errors in Tez HashTableLoader when estimated table size is 0
> 
>
> Key: HIVE-10704
> URL: https://issues.apache.org/jira/browse/HIVE-10704
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
> HIVE-10704.3.patch
>
>
> Couple of issues:
> - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
> tables, the largest small table selection is wrong and could select the large 
> table (which results in NPE)
> - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
> table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10830) First column of a Hive table created with LazyBinaryColumnarSerDe is not read properly

2015-05-26 Thread lovekesh bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lovekesh bansal updated HIVE-10830:
---
Description: 
1. create external table platdev.table_target ( id INT, message String, state 
string, date string ) partitioned by (country string) row format delimited 
fields terminated by ',' stored as RCFILE location 
'/user/nikgupta/table_target' ;

2. insert overwrite table platdev.table_target partition(country) select case 
when id=13 then 15 else id end,message,state,date,country from 
platdev.table_base2 where id between 13 and 16; \n"

say now my table is written by default using LazyBinaryColumnarSerDe and has 
the following data:
15  thirteendelhi   2-12-2014   india
14  fourteendelhi   1-1-2014india
15  fifteen florida 1-1-2014us
16  sixteen florida 2-12-2014   us

Now If I try to read the data with a mapreduce program, with map function as 
given below:

public void map(LongWritable key, BytesRefArrayWritable val, Context context)
throws IOException, InterruptedException {

for (int i = 0; i < val.size(); i++) {
 BytesRefWritable bytesRefread = val.get(i);
 byte[] currentCell = Arrays.copyOfRange(bytesRefread.getData(), 
bytesRefread.getStart(), bytesRefread.getStart()+bytesRefread.getLength());
 Text currentCellStr = new Text(currentCell);
 System.out.println("rowText="+currentCellStr   );
}
context.write(NullWritable.get(), bytes);
   }


and set  the following job configuration parameters:- 

job.setInputFormatClass(RCFileMapReduceInputFormat.class);
job.setOutputFormatClass(RCFileMapReduceOutputFormat.class);
jobConf.setInt(RCFile.COLUMN_NUMBER_CONF_STR, 5)
 

The output shown is as follows: (LazyBinaryColumnarSerDe)
rowText=
rowText=fifteen
rowText=goa
rowText=2-2-
rowText=us

But exactly the same case using the (ColumnarSerDe) explicitly in the table 
definition would give the following output:
rowText=1
rowText=fifteen
rowText=goa
rowText=2-2-
rowText=us

Point is that First column value is missing in the case of 
LazyBinaryColumnarSerDe.

  was:
1. create external table platdev.table_target ( id INT, message String, state 
string, date string ) partitioned by (country string) row format delimited 
fields terminated by ',' stored as RCFILE location 
'/user/nikgupta/table_target' ;

2. insert overwrite table platdev.table_target partition(country) select case 
when id=13 then 15 else id end,message,state,date,country from 
platdev.table_base2 where id between 13 and 16; \n"

say now my table has the following data:
15  thirteendelhi   2-12-2014   india
14  fourteendelhi   1-1-2014india
15  fifteen florida 1-1-2014us
16  sixteen florida 2-12-2014   us

Now If I try to read the data with a mapreduce program, with map function as 
given below:

public void map(LongWritable key, BytesRefArrayWritable val, Context context)
throws IOException, InterruptedException {

for (int i = 0; i < val.size(); i++) {
 BytesRefWritable bytesRefread = val.get(i);
 byte[] currentCell = Arrays.copyOfRange(bytesRefread.getData(), 
bytesRefread.getStart(), bytesRefread.getStart()+bytesRefread.getLength());
 Text currentCellStr = new Text(currentCell);
 System.out.println("rowText="+currentCellStr   );
}
context.write(NullWritable.get(), bytes);
   }


and set  the following job configuration parameters:- 

job.setInputFormatClass(RCFileMapReduceInputFormat.class);
job.setOutputFormatClass(RCFileMapReduceOutputFormat.class);
jobConf.setInt(RCFile.COLUMN_NUMBER_CONF_STR, 5)
 

The output shown is as follows:
rowText=
rowText=fifteen
rowText=goa
rowText=2-2-
rowText=us

But exactly the same case using the ColumnarSerDe explicitly in the table 
definition would give the following output:
rowText=1
rowText=fifteen
rowText=goa
rowText=2-2-
rowText=us

Point is that First column value is missing. 


> First column of a Hive table created with LazyBinaryColumnarSerDe is not read 
> properly
> --
>
> Key: HIVE-10830
> URL: https://issues.apache.org/jira/browse/HIVE-10830
> Project: Hive
>  Issue Type: Bug
>Reporter: lovekesh bansal
>
> 1. create external table platdev.table_target ( id INT, message String, state 
> string, date string ) partitioned by (country string) row format delimited 
> fields terminated by ',' stored as RCFILE location 
> '/user/nikgupta/table_target' ;
> 2. insert overwrite table platdev.table_target partition(country) select case 
> when id=13 then 15 else id end,message,state,date,country from 
> platdev.table_base2 where id between 13 and 16; \n"
> say 

[jira] [Commented] (HIVE-10716) Fold case/when udf for expression involving nulls in filter operator.

2015-05-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560403#comment-14560403
 ] 

Gopal V commented on HIVE-10716:


The easiest fix to the problem seems to be an additional filter expr to produce 
an AND()
{code}
hive> explain select avg(ss_sold_date_sk) from store_sales where (case 
ss_sold_date when '1998-01-02' then 1 else null end)=1;

 Map Operator Tree:
TableScan
  alias: store_sales
  filterExpr: CASE (ss_sold_date) WHEN ('1998-01-02') THEN 
(true) ELSE (null) END (type: int)
  Statistics: Num rows: 2474913 Data size: 9899654 Basic stats: 
COMPLETE Column stats: COMPLETE
{code}

vs

{code}
hive> explain select avg(ss_sold_date_sk) from store_sales where (case 
ss_sold_date when '1998-01-02' then 1 else null end)=1 and ss_sold_time_Sk > 0;
Map Operator Tree:
TableScan
  alias: store_sales
  filterExpr: ((ss_sold_date = '1998-01-02') and 
(ss_sold_time_sk > 0)) (type: boolean)
  Statistics: Num rows: 1237456 Data size: 9899654 Basic stats: 
COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: (ss_sold_time_sk > 0) (type: boolean)
{code}

[~ashutoshc]: any idea why the extra filter helps in fixing the PPD case?

> Fold case/when udf for expression involving nulls in filter operator.
> -
>
> Key: HIVE-10716
> URL: https://issues.apache.org/jira/browse/HIVE-10716
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Affects Versions: 1.3.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10716.patch
>
>
> From HIVE-10636 comments, more folding is possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10716) Fold case/when udf for expression involving nulls in filter operator.

2015-05-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560400#comment-14560400
 ] 

Gopal V commented on HIVE-10716:


[~ashutoshc]: LGTM - +1 for the count(1) case, but it looks really odd that the 
{{TableScan::filterExpr}} is not getting folded for this.

TableScan FilterExpr is populated before this folding happens, so it might just 
be an optimization ordering issue?

{code}
hive> explain select count(1) from store_sales where (case ss_sold_date when 
'x' then 1 else null end)=1;

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
  DagName: gopal_20150526214205_80c41d84-1694-47e9-ab24-144f8007b187:13
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: store_sales
  filterExpr: CASE (ss_sold_date) WHEN ('x') THEN (true) ELSE 
(null) END (type: int)
  Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
Column stats: COMPLETE
  Filter Operator
predicate: (ss_sold_date = 'x') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
Column stats: COMPLETE
Select Operator
  Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
Column stats: COMPLETE
  Group By Operator
aggregations: count(1)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 93 Basic stats: 
COMPLETE Column stats: COMPLETE
Reduce Output Operator
  sort order: 
  Statistics: Num rows: 1 Data size: 93 Basic stats: 
COMPLETE Column stats: COMPLETE
  value expressions: _col0 (type: bigint)
Execution mode: vectorized
Reducer 2 
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
{code}

> Fold case/when udf for expression involving nulls in filter operator.
> -
>
> Key: HIVE-10716
> URL: https://issues.apache.org/jira/browse/HIVE-10716
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Affects Versions: 1.3.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10716.patch
>
>
> From HIVE-10636 comments, more folding is possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10807) Invalidate basic stats for insert queries if autogather=false

2015-05-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10807:

Attachment: HIVE-10807.3.patch

> Invalidate basic stats for insert queries if autogather=false
> -
>
> Key: HIVE-10807
> URL: https://issues.apache.org/jira/browse/HIVE-10807
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.0
>Reporter: Gopal V
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10807.2.patch, HIVE-10807.3.patch, HIVE-10807.patch
>
>
> if stats.autogather=false leads to incorrect basic stats in case of insert 
> statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10813) Fix current test failures after HIVE-8769

2015-05-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-10813.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

Fixed by HIVE-10812

> Fix current test failures after HIVE-8769
> -
>
> Key: HIVE-10813
> URL: https://issues.apache.org/jira/browse/HIVE-10813
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 1.3.0
>
>
> We fix the stats annotation in HIVE-8769. However, there are some newly 
> committed test cases (e.g., udf_sha1.q) that are not covered in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation

2015-05-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10812:

Component/s: Statistics
 Physical Optimizer

> Scaling PK/FK's selectivity for stats annotation
> 
>
> Key: HIVE-10812
> URL: https://issues.apache.org/jira/browse/HIVE-10812
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer, Statistics
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 1.2.1
>
> Attachments: HIVE-10812.01.patch, HIVE-10812.02.patch, 
> HIVE-10812.03.patch
>
>
> Right now, the computation of the selectivity of FK side based on PK side 
> does not take into consideration of the range of FK and the range of PK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10807) Invalidate basic stats for insert queries if autogather=false

2015-05-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560370#comment-14560370
 ] 

Hive QA commented on HIVE-10807:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735432/HIVE-10807.2.patch

{color:red}ERROR:{color} -1 due to 59 failed/errored test(s), 8974 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_null_element
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_multi_field_struct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_optional_elements
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_required_elements
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_single_field_struct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_structs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_groups
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_primitives
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_primitives
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_single_field_struct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_create
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_null
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_arrays_of_ints
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_maps
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_nested_complex
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_read_backward_compatible_files
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_schema_evolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_primitives
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_single_field_struct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_crc32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sha1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join30
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAmbiguousSingleFieldGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroPrimitiveInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroSingleFieldGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testHiveRequiredGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testMultiFieldGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewOptionalGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewRequiredGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftPrimitiveInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftSingleFieldGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testUnannotatedListOfGroups
org.apache.hadoop.hive.ql.io.parquet.TestDataWritableWriter.testSimpleType
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testDoubleMapWithStructValue
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testMapWithComplexKey
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testNestedMap
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalArray
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalIntArray
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOptionalPrimitive
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapRequiredPrimitive
org.apache.hadoop.hive.ql.io.parquet.TestParquetSerDe.testParquetHiveSerDe
org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testEmptyContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testNullContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testRegularMap
org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testEmptyContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testNullContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testRegularMap
org.apache.hadoop.hi

[jira] [Updated] (HIVE-686) add UDF substring_index

2015-05-26 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-686:
-
Attachment: HIVE-686.1.patch

patch #1
- derive substring_index from GenericUDF
- add Junit and qtest tests

> add UDF substring_index
> ---
>
> Key: HIVE-686
> URL: https://issues.apache.org/jira/browse/HIVE-686
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Namit Jain
>Assignee: Alexander Pivovarov
> Attachments: HIVE-686.1.patch, HIVE-686.patch, HIVE-686.patch
>
>
> SUBSTRING_INDEX(str,delim,count)
> Returns the substring from string str before count occurrences of the 
> delimiter delim. If count is positive, everything to the left of the final 
> delimiter (counting from the left) is returned. If count is negative, 
> everything to the right of the final delimiter (counting from the right) is 
> returned. SUBSTRING_INDEX() performs a case-sensitive match when searching 
> for delim.
> Examples:
> {code}
> SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3);
> --www.mysql.com
> SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
> --www.mysql
> SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1);
> --www
> SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0);
> --''
> SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1);
> --com
> SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
> --mysql.com
> SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3);
> --www.mysql.com
> {code}
> {code}
> --#delim does not exist in str
> SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1);
> --www.mysql.com
> --#delim is 2 chars
> SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2);
> --www||mysql
> --#delim is empty string
> SELECT SUBSTRING_INDEX('www.mysql.com', '', 2);
> --''
> --#str is empty string
> SELECT SUBSTRING_INDEX('', '.', 2);
> --''
> {code}
> {code}
> --#null params
> SELECT SUBSTRING_INDEX(null, '.', 1);
> --null
> SELECT SUBSTRING_INDEX('www.mysql.com', null, 1);
> --null
> SELECT SUBSTRING_INDEX('www.mysql.com', '.', null);
> --null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10550) Dynamic RDD caching optimization for HoS.[Spark Branch]

2015-05-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560342#comment-14560342
 ] 

Hive QA commented on HIVE-10550:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735497/HIVE-10550.5-spark.patch

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8721 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did 
not produce a TEST-*.xml file
TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not 
produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more
 - did not produce a TEST-*.xml file
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/866/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/866/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-866/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735497 - PreCommit-HIVE-SPARK-Build

> Dynamic RDD caching optimization for HoS.[Spark Branch]
> ---
>
> Key: HIVE-10550
> URL: https://issues.apache.org/jira/browse/HIVE-10550
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Attachments: HIVE-10550.1-spark.patch, HIVE-10550.1.patch, 
> HIVE-10550.2-spark.patch, HIVE-10550.3-spark.patch, HIVE-10550.4-spark.patch, 
> HIVE-10550.5-spark.patch
>
>
> A Hive query may try to scan the same table multi times, like self-join, 
> self-union, or even share the same subquery, [TPC-DS 
> Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql]
>  is an example. As you may know that, Spark support cache RDD data, which 
> mean Spark would put the calculated RDD data in memory and get the data from 
> memory directly for next time, this avoid the calculation cost of this 
> RDD(and all the cost of its dependencies) at the cost of more memory usage. 
> Through analyze the query context, we should be able to understand which part 
> of query could be shared, so that we can reuse the cached RDD in the 
> generated Spark job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases

2015-05-26 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560335#comment-14560335
 ] 

Laljo John Pullokkaran commented on HIVE-10811:
---

Why do we need to keep the fields from input that is part of the collation but 
is not used by parent. If no operators from parent refer to that column then i 
don't see how preserving sort order is helpful.

> RelFieldTrimmer throws NoSuchElementException in some cases
> ---
>
> Key: HIVE-10811
> URL: https://issues.apache.org/jira/browse/HIVE-10811
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, 
> HIVE-10811.patch
>
>
> RelFieldTrimmer runs into NoSuchElementException in some cases.
> Stack trace:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Internal error: While 
> invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
>   at org.apache.calcite.util.Util.newInternal(Util.java:743)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536)
>   ... 32 more
> Caused by: java.lang.AssertionError: Internal error: While invoking method 
> 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.ut

[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO

2015-05-26 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560332#comment-14560332
 ] 

Laljo John Pullokkaran commented on HIVE-9069:
--

[~jcamachorodriguez] In extractCommonOperands for a disjunction if any operand 
doesn't have any of the reductionCondition then we can short circuit and bail 
out.

> Simplify filter predicates for CBO
> --
>
> Key: HIVE-9069
> URL: https://issues.apache.org/jira/browse/HIVE-9069
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Jesus Camacho Rodriguez
> Fix For: 0.14.1
>
> Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, 
> HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, 
> HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, 
> HIVE-9069.08.patch, HIVE-9069.09.patch, HIVE-9069.10.patch, 
> HIVE-9069.11.patch, HIVE-9069.12.patch, HIVE-9069.13.patch, 
> HIVE-9069.14.patch, HIVE-9069.14.patch, HIVE-9069.patch
>
>
> Simplify predicates for disjunctive predicates so that can get pushed down to 
> the scan.
> Looks like this is still an issue, some of the filters can be pushed down to 
> the scan.
> {code}
> set hive.cbo.enable=true
> set hive.stats.fetch.column.stats=true
> set hive.exec.dynamic.partition.mode=nonstrict
> set hive.tez.auto.reducer.parallelism=true
> set hive.auto.convert.join.noconditionaltask.size=32000
> set hive.exec.reducers.bytes.per.reducer=1
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
> set hive.support.concurrency=false
> set hive.tez.exec.print.summary=true
> explain  
> select  substr(r_reason_desc,1,20) as r
>,avg(ws_quantity) wq
>,avg(wr_refunded_cash) ref
>,avg(wr_fee) fee
>  from web_sales, web_returns, web_page, customer_demographics cd1,
>   customer_demographics cd2, customer_address, date_dim, reason 
>  where web_sales.ws_web_page_sk = web_page.wp_web_page_sk
>and web_sales.ws_item_sk = web_returns.wr_item_sk
>and web_sales.ws_order_number = web_returns.wr_order_number
>and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998
>and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk 
>and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
>and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
>and reason.r_reason_sk = web_returns.wr_reason_sk
>and
>(
> (
>  cd1.cd_marital_status = 'M'
>  and
>  cd1.cd_marital_status = cd2.cd_marital_status
>  and
>  cd1.cd_education_status = '4 yr Degree'
>  and 
>  cd1.cd_education_status = cd2.cd_education_status
>  and
>  ws_sales_price between 100.00 and 150.00
> )
>or
> (
>  cd1.cd_marital_status = 'D'
>  and
>  cd1.cd_marital_status = cd2.cd_marital_status
>  and
>  cd1.cd_education_status = 'Primary' 
>  and
>  cd1.cd_education_status = cd2.cd_education_status
>  and
>  ws_sales_price between 50.00 and 100.00
> )
>or
> (
>  cd1.cd_marital_status = 'U'
>  and
>  cd1.cd_marital_status = cd2.cd_marital_status
>  and
>  cd1.cd_education_status = 'Advanced Degree'
>  and
>  cd1.cd_education_status = cd2.cd_education_status
>  and
>  ws_sales_price between 150.00 and 200.00
> )
>)
>and
>(
> (
>  ca_country = 'United States'
>  and
>  ca_state in ('KY', 'GA', 'NM')
>  and ws_net_profit between 100 and 200  
> )
> or
> (
>  ca_country = 'United States'
>  and
>  ca_state in ('MT', 'OR', 'IN')
>  and ws_net_profit between 150 and 300  
> )
> or
> (
>  ca_country = 'United States'
>  and
>  ca_state in ('WI', 'MO', 'WV')
>  and ws_net_profit between 50 and 250  
> )
>)
> group by r_reason_desc
> order by r, wq, ref, fee
> limit 100
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 9 <- Map 1 (BROADCAST_EDGE)
> Reducer 3 <- Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
> Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 
> (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE)
> Reducer 7 <- Reducer 6 (SIMPLE_EDGE)
> Reducer 8 <- Reducer 7 (SIMPLE_EDGE)
>   DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: web_page
>   filterExpr: wp_web_page_sk is not n

[jira] [Updated] (HIVE-10829) ATS hook fails for explainTask

2015-05-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10829:
---
Attachment: HIVE-10829.01.patch

> ATS hook fails for explainTask
> --
>
> Key: HIVE-10829
> URL: https://issues.apache.org/jira/browse/HIVE-10829
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-10829.01.patch
>
>
> Commands:
> create table idtable(id string);
> create table ctastable as select * from idtable;
> With ATS hook:
> 2015-05-22 18:54:47,092 INFO  [ATS Logger 0]: hooks.ATSHook 
> (ATSHook.java:run(136)) - Failed to submit plan to ATS: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:589)
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:576)
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:821)
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.outputStagePlans(ExplainTask.java:965)
> at 
> org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:219)
> at org.apache.hadoop.hive.ql.hooks.ATSHook$2.run(ATSHook.java:120)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity

2015-05-26 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7723:
--
Attachment: HIVE-7723.11.patch

> Explain plan for complex query with lots of partitions is slow due to 
> in-efficient collection used to find a matching ReadEntity
> 
>
> Key: HIVE-7723
> URL: https://issues.apache.org/jira/browse/HIVE-7723
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Physical Optimizer
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, 
> HIVE-7723.11.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, 
> HIVE-7723.5.patch, HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, 
> HIVE-7723.9.patch
>
>
> Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
> showed that ReadEntity.equals is taking ~40% of the CPU.
> ReadEntity.equals is called from the snippet below.
> Again and again the set is iterated over to get the actual match, a HashMap 
> is a better option for this case as Set doesn't have a Get method.
> Also for ReadEntity equals is case-insensitive while hash is , which is an 
> undesired behavior.
> {code}
> public static ReadEntity addInput(Set inputs, ReadEntity 
> newInput) {
> // If the input is already present, make sure the new parent is added to 
> the input.
> if (inputs.contains(newInput)) {
>   for (ReadEntity input : inputs) {
> if (input.equals(newInput)) {
>   if ((newInput.getParents() != null) && 
> (!newInput.getParents().isEmpty())) {
> input.getParents().addAll(newInput.getParents());
> input.setDirect(input.isDirect() || newInput.isDirect());
>   }
>   return input;
> }
>   }
>   assert false;
> } else {
>   inputs.add(newInput);
>   return newInput;
> }
> // make compile happy
> return null;
>   }
> {code}
> This is the query used : 
> {code}
> select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
> ,cs1.b_streen_name ,cs1.b_city
>  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
> ,cs1.c_zip ,cs1.syear ,cs1.cnt
>  ,cs1.s1 ,cs1.s2 ,cs1.s3
>  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
> from
> (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
> store_name
>  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
> ,ad1.ca_street_name as b_streen_name
>  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
> c_street_number
>  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
> as c_zip
>  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
> as cnt
>  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
> ,sum(ss_coupon_amt) as s3
>   FROM   store_sales
> JOIN store_returns ON store_sales.ss_item_sk = 
> store_returns.sr_item_sk and store_sales.ss_ticket_number = 
> store_returns.sr_ticket_number
> JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
> JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
> JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
> JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
> JOIN store ON store_sales.ss_store_sk = store.s_store_sk
> JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
> cd1.cd_demo_sk
> JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
> cd2.cd_demo_sk
> JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
> JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
> hd1.hd_demo_sk
> JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
> hd2.hd_demo_sk
> JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
> ad1.ca_address_sk
> JOIN customer_address ad2 ON customer.c_current_addr_sk = 
> ad2.ca_address_sk
> JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
> JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
> JOIN item ON store_sales.ss_item_sk = item.i_item_sk
> JOIN
>  (select cs_item_sk
> ,sum(cs_ext_list_price) as 
> sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
>   from catalog_sales JOIN catalog_returns
>   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
> and catalog_sales.cs_order_number = catalog_returns.cr_order_number
>   group by cs_item_sk
>   having 
> sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
>  cs_ui
>

[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity

2015-05-26 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7723:
--
Attachment: (was: HIVE-7723.11.patch)

> Explain plan for complex query with lots of partitions is slow due to 
> in-efficient collection used to find a matching ReadEntity
> 
>
> Key: HIVE-7723
> URL: https://issues.apache.org/jira/browse/HIVE-7723
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Physical Optimizer
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, 
> HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, HIVE-7723.5.patch, 
> HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, HIVE-7723.9.patch
>
>
> Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
> showed that ReadEntity.equals is taking ~40% of the CPU.
> ReadEntity.equals is called from the snippet below.
> Again and again the set is iterated over to get the actual match, a HashMap 
> is a better option for this case as Set doesn't have a Get method.
> Also for ReadEntity equals is case-insensitive while hash is , which is an 
> undesired behavior.
> {code}
> public static ReadEntity addInput(Set inputs, ReadEntity 
> newInput) {
> // If the input is already present, make sure the new parent is added to 
> the input.
> if (inputs.contains(newInput)) {
>   for (ReadEntity input : inputs) {
> if (input.equals(newInput)) {
>   if ((newInput.getParents() != null) && 
> (!newInput.getParents().isEmpty())) {
> input.getParents().addAll(newInput.getParents());
> input.setDirect(input.isDirect() || newInput.isDirect());
>   }
>   return input;
> }
>   }
>   assert false;
> } else {
>   inputs.add(newInput);
>   return newInput;
> }
> // make compile happy
> return null;
>   }
> {code}
> This is the query used : 
> {code}
> select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
> ,cs1.b_streen_name ,cs1.b_city
>  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
> ,cs1.c_zip ,cs1.syear ,cs1.cnt
>  ,cs1.s1 ,cs1.s2 ,cs1.s3
>  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
> from
> (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
> store_name
>  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
> ,ad1.ca_street_name as b_streen_name
>  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
> c_street_number
>  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
> as c_zip
>  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
> as cnt
>  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
> ,sum(ss_coupon_amt) as s3
>   FROM   store_sales
> JOIN store_returns ON store_sales.ss_item_sk = 
> store_returns.sr_item_sk and store_sales.ss_ticket_number = 
> store_returns.sr_ticket_number
> JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
> JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
> JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
> JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
> JOIN store ON store_sales.ss_store_sk = store.s_store_sk
> JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
> cd1.cd_demo_sk
> JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
> cd2.cd_demo_sk
> JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
> JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
> hd1.hd_demo_sk
> JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
> hd2.hd_demo_sk
> JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
> ad1.ca_address_sk
> JOIN customer_address ad2 ON customer.c_current_addr_sk = 
> ad2.ca_address_sk
> JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
> JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
> JOIN item ON store_sales.ss_item_sk = item.i_item_sk
> JOIN
>  (select cs_item_sk
> ,sum(cs_ext_list_price) as 
> sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
>   from catalog_sales JOIN catalog_returns
>   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
> and catalog_sales.cs_order_number = catalog_returns.cr_order_number
>   group by cs_item_sk
>   having 
> sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
>  cs_ui
> ON store_sa

[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO

2015-05-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560306#comment-14560306
 ] 

Hive QA commented on HIVE-9069:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735433/HIVE-9069.14.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8975 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_7
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_7
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4049/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4049/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4049/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735433 - PreCommit-HIVE-TRUNK-Build

> Simplify filter predicates for CBO
> --
>
> Key: HIVE-9069
> URL: https://issues.apache.org/jira/browse/HIVE-9069
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Jesus Camacho Rodriguez
> Fix For: 0.14.1
>
> Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, 
> HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, 
> HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, 
> HIVE-9069.08.patch, HIVE-9069.09.patch, HIVE-9069.10.patch, 
> HIVE-9069.11.patch, HIVE-9069.12.patch, HIVE-9069.13.patch, 
> HIVE-9069.14.patch, HIVE-9069.14.patch, HIVE-9069.patch
>
>
> Simplify predicates for disjunctive predicates so that can get pushed down to 
> the scan.
> Looks like this is still an issue, some of the filters can be pushed down to 
> the scan.
> {code}
> set hive.cbo.enable=true
> set hive.stats.fetch.column.stats=true
> set hive.exec.dynamic.partition.mode=nonstrict
> set hive.tez.auto.reducer.parallelism=true
> set hive.auto.convert.join.noconditionaltask.size=32000
> set hive.exec.reducers.bytes.per.reducer=1
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
> set hive.support.concurrency=false
> set hive.tez.exec.print.summary=true
> explain  
> select  substr(r_reason_desc,1,20) as r
>,avg(ws_quantity) wq
>,avg(wr_refunded_cash) ref
>,avg(wr_fee) fee
>  from web_sales, web_returns, web_page, customer_demographics cd1,
>   customer_demographics cd2, customer_address, date_dim, reason 
>  where web_sales.ws_web_page_sk = web_page.wp_web_page_sk
>and web_sales.ws_item_sk = web_returns.wr_item_sk
>and web_sales.ws_order_number = web_returns.wr_order_number
>and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998
>and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk 
>and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
>and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
>and reason.r_reason_sk = web_returns.wr_reason_sk
>and
>(
> (
>  cd1.cd_marital_status = 'M'
>  and
>  cd1.cd_marital_status = cd2.cd_marital_status
>  and
>  cd1.cd_education_status = '4 yr Degree'
>  and 
>  cd1.cd_education_status = cd2.cd_education_status
>  and
>  ws_sales_price between 100.00 and 150.00
> )
>or
> (
>  cd1.cd_marital_status = 'D'
>  and
>  cd1.cd_marital_status = cd2.cd_marital_status
>  and
>  cd1.cd_education_status = 'Primary' 
>  and
>  cd1.cd_education_status = cd2.cd_education_status
>  and
>  ws_sales_price between 50.00 and 100.00
> )
>or
> (
>  cd1.cd_marital_status = 'U'
>  and
>  cd1.cd_marital_status = cd2.cd_marital_status
>  and
>  cd1.cd_education_status = 'Advanced Degree'
>  and
>  cd1.cd_education_status = cd2.cd_education_status
>  and
>  ws_sales_price between 150.00 and 200.00
> )
>)
>and
>(
> (
>  ca_country = 'United States'
>  and
>  ca_state in ('KY', 'GA', 'NM')
>  and ws_net_profit between 100 and 200  
> )
> or
> (
>  ca_country = 'United States'
>  and
>  ca_state in ('MT', 'OR', 'IN')
>  and ws_net_profit between 150 and 300  
> )
> or
> (
>  ca_country = 'United States'
>  and
>  ca_state in ('WI', 'MO', 'WV')
>  and ws_net_profit between 50 and 250  
> 

[jira] [Updated] (HIVE-10689) HS2 metadata api calls should use HiveAuthorizer interface for authorization

2015-05-26 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-10689:
-
Attachment: HIVE-10689.1.patch

> HS2 metadata api calls should use HiveAuthorizer interface for authorization
> 
>
> Key: HIVE-10689
> URL: https://issues.apache.org/jira/browse/HIVE-10689
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, SQLStandardAuthorization
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-10689.1.patch
>
>
> java.sql.DataBaseMetadata apis in jdbc api result in calls to HS2 metadata 
> api's and their execution is via separate Hive Operation implementations, 
> that don't use the Hive Driver class. Invocation of these api's should also 
> be authorized using the HiveAuthorizer api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10761) Create codahale-based metrics system for Hive

2015-05-26 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-10761:
-
Attachment: HIVE-10761.2.patch

Some loose ends, like make it take in configured list of reporters, and add 
end-to-end unit test for Metastore metrics, latest patch should be ready for 
review.

> Create codahale-based metrics system for Hive
> -
>
> Key: HIVE-10761
> URL: https://issues.apache.org/jira/browse/HIVE-10761
> Project: Hive
>  Issue Type: New Feature
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-10761.2.patch, HIVE-10761.patch, hms-metrics.json
>
>
> There is a current Hive metrics system that hooks up to a JMX reporting, but 
> all its measurements, models are custom.
> This is to make another metrics system that will be based on Codahale (ie 
> yammer, dropwizard), which has the following advantage:
> * Well-defined metric model for frequently-needed metrics (ie JVM metrics)
> * Well-defined measurements for all metrics (ie max, mean, stddev, mean_rate, 
> etc), 
> * Built-in reporting frameworks like JMX, Console, Log, JSON webserver
> It is used for many projects, including several Apache projects like Oozie.  
> Overall, monitoring tools should find it easier to understand these common 
> metric, measurement, reporting models.
> The existing metric subsystem will be kept and can be enabled if backward 
> compatibility is desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10828) Insert...values for fewer number of columns fail

2015-05-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10828:
--
Description: 
Schema on insert queries with fewer number of columns fails with below error 
message
{noformat}
ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: 
NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{noformat}
*Steps to reproduce:*

set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.enforce.bucketing=true;
drop table if exists table1; 
create table table1 (a int, b string, c string) 
   partitioned by (bkt int) 
   clustered by (a) into 2 buckets 
   stored as orc 
   tblproperties ('transactional'='true'); 
insert into table_1 partition (bkt) (b, a, bkt) values 
('part one', 1, 1), ('part one', 2, 1), ('part two', 3, 2), ('part three', 
4, 3);


  was:
Schema on insert queries with fewer number of columns fails with below error 
message

ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: 
NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at org.apache.hadoo

[jira] [Commented] (HIVE-10828) Insert...values for fewer number of columns fail

2015-05-26 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560290#comment-14560290
 ] 

Eugene Koifman commented on HIVE-10828:
---

Simpler repro case
{noformat}
set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.cbo.enable=false;

drop table if exists acid_partitioned;
create table acid_partitioned (a int, c string)
  partitioned by (p int)
  clustered by (a) into 1 buckets;
  
insert into acid_partitioned partition (p) (a,p) values(1,1);
{noformat}

above example disables CBO because it causes additional issues.  will file 
separate ticket for that

> Insert...values for fewer number of columns fail
> 
>
> Key: HIVE-10828
> URL: https://issues.apache.org/jira/browse/HIVE-10828
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Eugene Koifman
>
> Schema on insert queries with fewer number of columns fails with below error 
> message
> ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
> at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> *Steps to reproduce:*
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.enforce.bucketing=true;
> drop table if exists table1; 
> create table table1 (a int, b string, c string) 
>partitioned by (bkt int) 
>clustered by (a) into 2 buckets 
>stored as orc 
>tblproperties ('transactional'='true'); 
> insert into table_1 partition (bkt) (b, a, bkt) values 
> ('part one', 1, 1), ('part one', 2, 1), ('part two', 3, 2), ('part 
> three', 4, 3);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10550) Dynamic RDD caching optimization for HoS.[Spark Branch]

2015-05-26 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-10550:
-
Attachment: HIVE-10550.5-spark.patch

> Dynamic RDD caching optimization for HoS.[Spark Branch]
> ---
>
> Key: HIVE-10550
> URL: https://issues.apache.org/jira/browse/HIVE-10550
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Attachments: HIVE-10550.1-spark.patch, HIVE-10550.1.patch, 
> HIVE-10550.2-spark.patch, HIVE-10550.3-spark.patch, HIVE-10550.4-spark.patch, 
> HIVE-10550.5-spark.patch
>
>
> A Hive query may try to scan the same table multi times, like self-join, 
> self-union, or even share the same subquery, [TPC-DS 
> Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql]
>  is an example. As you may know that, Spark support cache RDD data, which 
> mean Spark would put the calculated RDD data in memory and get the data from 
> memory directly for next time, this avoid the calculation cost of this 
> RDD(and all the cost of its dependencies) at the cost of more memory usage. 
> Through analyze the query context, we should be able to understand which part 
> of query could be shared, so that we can reuse the cached RDD in the 
> generated Spark job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10828) Insert...values for fewer number of columns fail

2015-05-26 Thread Aswathy Chellammal Sreekumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aswathy Chellammal Sreekumar updated HIVE-10828:

Description: 
Schema on insert queries with fewer number of columns fails with below error 
message

ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: 
NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Steps to reproduce:
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.enforce.bucketing=true;
drop table if exists table1; 
create table table1 (a int, b string, c string) 
   partitioned by (bkt int) 
   clustered by (a) into 2 buckets 
   stored as orc 
   tblproperties ('transactional'='true'); 
insert into table_1 partition (bkt) (b, a, bkt) values 
('part one', 1, 1), ('part one', 2, 1), ('part two', 3, 2), ('part three', 
4, 3);


  was:
Schema on insert queries with fewer number of columns fails with below error 
message

ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: 
NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at org.apache.

[jira] [Updated] (HIVE-10828) Insert...values for fewer number of columns fail

2015-05-26 Thread Aswathy Chellammal Sreekumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aswathy Chellammal Sreekumar updated HIVE-10828:

Description: 
Schema on insert queries with fewer number of columns fails with below error 
message

ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: 
NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

*Steps to reproduce:*

set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.enforce.bucketing=true;
drop table if exists table1; 
create table table1 (a int, b string, c string) 
   partitioned by (bkt int) 
   clustered by (a) into 2 buckets 
   stored as orc 
   tblproperties ('transactional'='true'); 
insert into table_1 partition (bkt) (b, a, bkt) values 
('part one', 1, 1), ('part one', 2, 1), ('part two', 3, 2), ('part three', 
4, 3);


  was:
Schema on insert queries with fewer number of columns fails with below error 
message

ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: 
NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at org.apac

[jira] [Commented] (HIVE-10788) Change sort_array to support non-primitive types

2015-05-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560240#comment-14560240
 ] 

Hive QA commented on HIVE-10788:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735427/HIVE-10788.1.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 8977 tests executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_crc32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sha1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join30
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udf_sort_array_wrong1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4048/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4048/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4048/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735427 - PreCommit-HIVE-TRUNK-Build

> Change sort_array to support non-primitive types
> 
>
> Key: HIVE-10788
> URL: https://issues.apache.org/jira/browse/HIVE-10788
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-10788.1.patch
>
>
> Currently {{sort_array}} only support primitive types. As we already support 
> comparison between non-primitive types, it makes sense to remove this 
> restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10819) SearchArgumentImpl for Timestamp is broken by HIVE-10286

2015-05-26 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560238#comment-14560238
 ] 

Ferdinand Xu commented on HIVE-10819:
-

Hi [~sershe], [~daijy], the problematic commit is already reverted.
{noformat}
Repository: hive
Updated Branches:
  refs/heads/master db8067f96 -> a00bf4f87


Revert "HIVE-10277: Unable to process Comment line '--' in HIVE-1.1.0 (Chinna 
via Xuefu)"

This reverts commit d66a7347ab97983cc5b9fca6bdabebc81e5a77e5.
{noformat}

> SearchArgumentImpl for Timestamp is broken by HIVE-10286
> 
>
> Key: HIVE-10819
> URL: https://issues.apache.org/jira/browse/HIVE-10819
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 1.2.1
>
> Attachments: HIVE-10819.1.patch, HIVE-10819.2.patch, 
> HIVE-10819.3.patch
>
>
> The work around for kryo bug for Timestamp is accidentally removed by 
> HIVE-10286. Need to bring it back.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10528) Hiveserver2 in HTTP mode is not applying auth_to_local rules

2015-05-26 Thread Abdelrahman Shettia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdelrahman Shettia updated HIVE-10528:
---
Attachment: HIVE-10528.3.patch

> Hiveserver2 in HTTP mode is not applying auth_to_local rules
> 
>
> Key: HIVE-10528
> URL: https://issues.apache.org/jira/browse/HIVE-10528
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0
> Environment: Centos 6
>Reporter: Abdelrahman Shettia
>Assignee: Abdelrahman Shettia
> Attachments: HIVE-10528.1.patch, HIVE-10528.1.patch, 
> HIVE-10528.2.patch, HIVE-10528.3.patch
>
>
> PROBLEM: Authenticating to HS2 in HTTP mode with Kerberos, auth_to_local 
> mappings do not get applied.  Because of this various permissions checks 
> which rely on the local cluster name for a user are going to fail.
> STEPS TO REPRODUCE:
> 1.  Create  kerberos cluster  and HS2 in HTTP mode
> 2.  Create a new user, test, along with a kerberos principal for this user
> 3.  Create a separate principal, mapped-test
> 4.  Create an auth_to_local rule to make sure that mapped-test is mapped to 
> test
> 5.  As the test user, connect to HS2 with beeline and create a simple table:
> {code}
> CREATE TABLE permtest (field1 int);
> {code}
> There is no need to load anything into this table.
> 6.  Establish that it works as the test user:
> {code}
> show create table permtest;
> {code}
> 7.  Drop the test identity and become mapped-test
> 8.  Re-connect to HS2 with beeline, re-run the above command:
> {code}
> show create table permtest;
> {code}
> You will find that when this is done in HTTP mode, you will get an HDFS error 
> (because of StorageBasedAuthorization doing a HDFS permissions check) and the 
> user will be mapped-test and NOT test as it should be.
> ANALYSIS:  This appears to be HTTP specific and the problem seems to come in 
> {{ThriftHttpServlet$HttpKerberosServerAction.getPrincipalWithoutRealmAndHost()}}:
> {code}
>   try {
> fullKerberosName = 
> ShimLoader.getHadoopShims().getKerberosNameShim(fullPrincipal);
>   } catch (IOException e) {
> throw new HttpAuthenticationException(e);
>   }
>   return fullKerberosName.getServiceName();
> {code}
> getServiceName applies no auth_to_local rules.  Seems like maybe this should 
> be getShortName()?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-4239) Remove lock on compilation stage

2015-05-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-4239:
--

Assignee: Sergey Shelukhin

> Remove lock on compilation stage
> 
>
> Key: HIVE-4239
> URL: https://issues.apache.org/jira/browse/HIVE-4239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Processor
>Reporter: Carl Steinbach
>Assignee: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10528) Hiveserver2 in HTTP mode is not applying auth_to_local rules

2015-05-26 Thread Abdelrahman Shettia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdelrahman Shettia updated HIVE-10528:
---
Attachment: HIVE-10528.2.patch

> Hiveserver2 in HTTP mode is not applying auth_to_local rules
> 
>
> Key: HIVE-10528
> URL: https://issues.apache.org/jira/browse/HIVE-10528
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0
> Environment: Centos 6
>Reporter: Abdelrahman Shettia
>Assignee: Abdelrahman Shettia
> Attachments: HIVE-10528.1.patch, HIVE-10528.1.patch, 
> HIVE-10528.2.patch
>
>
> PROBLEM: Authenticating to HS2 in HTTP mode with Kerberos, auth_to_local 
> mappings do not get applied.  Because of this various permissions checks 
> which rely on the local cluster name for a user are going to fail.
> STEPS TO REPRODUCE:
> 1.  Create  kerberos cluster  and HS2 in HTTP mode
> 2.  Create a new user, test, along with a kerberos principal for this user
> 3.  Create a separate principal, mapped-test
> 4.  Create an auth_to_local rule to make sure that mapped-test is mapped to 
> test
> 5.  As the test user, connect to HS2 with beeline and create a simple table:
> {code}
> CREATE TABLE permtest (field1 int);
> {code}
> There is no need to load anything into this table.
> 6.  Establish that it works as the test user:
> {code}
> show create table permtest;
> {code}
> 7.  Drop the test identity and become mapped-test
> 8.  Re-connect to HS2 with beeline, re-run the above command:
> {code}
> show create table permtest;
> {code}
> You will find that when this is done in HTTP mode, you will get an HDFS error 
> (because of StorageBasedAuthorization doing a HDFS permissions check) and the 
> user will be mapped-test and NOT test as it should be.
> ANALYSIS:  This appears to be HTTP specific and the problem seems to come in 
> {{ThriftHttpServlet$HttpKerberosServerAction.getPrincipalWithoutRealmAndHost()}}:
> {code}
>   try {
> fullKerberosName = 
> ShimLoader.getHadoopShims().getKerberosNameShim(fullPrincipal);
>   } catch (IOException e) {
> throw new HttpAuthenticationException(e);
>   }
>   return fullKerberosName.getServiceName();
> {code}
> getServiceName applies no auth_to_local rules.  Seems like maybe this should 
> be getShortName()?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10731) NullPointerException in HiveParser.g

2015-05-26 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560177#comment-14560177
 ] 

Pengcheng Xiong commented on HIVE-10731:


[~jpullokkaran], this patch also needs your review. Thanks.

> NullPointerException in HiveParser.g
> 
>
> Key: HIVE-10731
> URL: https://issues.apache.org/jira/browse/HIVE-10731
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.0
>Reporter: Xiu
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-10731.01.patch
>
>
> In HiveParser.g:
> {code:Java}
> protected boolean useSQL11ReservedKeywordsForIdentifier() {
> return !HiveConf.getBoolVar(hiveConf, 
> HiveConf.ConfVars.HIVE_SUPPORT_SQL11_RESERVED_KEYWORDS);
> }
> {code}
> NullPointerException is thrown when hiveConf is not set.
> Stack trace:
> {code:Java}
> java.lang.NullPointerException
> at org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:2583)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.useSQL11ReservedKeywordsForIdentifier(HiveParser.java:1000)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.useSQL11ReservedKeywordsForIdentifier(HiveParser_IdentifiersParser.java:726)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:10922)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:45808)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:38008)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:36167)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:5214)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2640)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1650)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:161)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10804) CBO: Calcite Operator To Hive Operator (Calcite Return Path): optimizer for limit 0 does not work

2015-05-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10804:
---
Attachment: HIVE-10804.01.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): optimizer for 
> limit 0 does not work
> -
>
> Key: HIVE-10804
> URL: https://issues.apache.org/jira/browse/HIVE-10804
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10804.01.patch
>
>
> {code}
> explain
> select key,value from src order by key limit 0
> POSTHOOK: type: QUERY
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: src
> Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: key (type: string), value (type: string)
>   outputColumnNames: key, value
>   Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: key (type: string)
> sort order: +
> Statistics: Num rows: 500 Data size: 5312 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: value (type: string)
>   Reduce Operator Tree:
> Select Operator
>   expressions: KEY.reducesinkkey0 (type: string), VALUE.value (type: 
> string)
>   outputColumnNames: key, value
>   Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
>   Limit
> Number of rows: 0
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   table:
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories

2015-05-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560106#comment-14560106
 ] 

Hive QA commented on HIVE-10809:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735409/HIVE-10809.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 8974 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_crc32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sha1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join30
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4047/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4047/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4047/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735409 - PreCommit-HIVE-TRUNK-Build

> HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
> --
>
> Key: HIVE-10809
> URL: https://issues.apache.org/jira/browse/HIVE-10809
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Selina Zhang
>Assignee: Selina Zhang
> Attachments: HIVE-10809.1.patch, HIVE-10809.2.patch
>
>
> When static partition is added through HCatStorer or HCatWriter
> {code}
> JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader();
> STORE JoinedData INTO 'selina.joined_events_e' USING 
> org.apache.hive.hcatalog.pig.HCatStorer('author=selina');
> {code}
> The table directory looks like
> {noformat}
> drwx--   - selinaz users  0 2015-05-22 21:19 
> /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798
> drwx--   - selinaz users  0 2015-05-22 21:19 
> /user/selinaz/joined_events_e/author=selina
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-05-26 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560083#comment-14560083
 ] 

Mostafa Mokhtar commented on HIVE-10704:


[~apivovarov]
Ditto for this one. 

> Errors in Tez HashTableLoader when estimated table size is 0
> 
>
> Key: HIVE-10704
> URL: https://issues.apache.org/jira/browse/HIVE-10704
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
> HIVE-10704.3.patch
>
>
> Couple of issues:
> - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
> tables, the largest small table selection is wrong and could select the large 
> table (which results in NPE)
> - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
> table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10819) SearchArgumentImpl for Timestamp is broken by HIVE-10286

2015-05-26 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-10819:
--
Attachment: HIVE-10819.3.patch

The test failures doesn't seems related. Reattach the patch and test again.

> SearchArgumentImpl for Timestamp is broken by HIVE-10286
> 
>
> Key: HIVE-10819
> URL: https://issues.apache.org/jira/browse/HIVE-10819
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 1.2.1
>
> Attachments: HIVE-10819.1.patch, HIVE-10819.2.patch, 
> HIVE-10819.3.patch
>
>
> The work around for kryo bug for Timestamp is accidentally removed by 
> HIVE-10286. Need to bring it back.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10807) Invalidate basic stats for insert queries if autogather=false

2015-05-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10807:

Attachment: HIVE-10807.2.patch

> Invalidate basic stats for insert queries if autogather=false
> -
>
> Key: HIVE-10807
> URL: https://issues.apache.org/jira/browse/HIVE-10807
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.0
>Reporter: Gopal V
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10807.2.patch, HIVE-10807.patch
>
>
> if stats.autogather=false leads to incorrect basic stats in case of insert 
> statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9069) Simplify filter predicates for CBO

2015-05-26 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-9069:
--
Attachment: HIVE-9069.14.patch

> Simplify filter predicates for CBO
> --
>
> Key: HIVE-9069
> URL: https://issues.apache.org/jira/browse/HIVE-9069
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Jesus Camacho Rodriguez
> Fix For: 0.14.1
>
> Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, 
> HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, 
> HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, 
> HIVE-9069.08.patch, HIVE-9069.09.patch, HIVE-9069.10.patch, 
> HIVE-9069.11.patch, HIVE-9069.12.patch, HIVE-9069.13.patch, 
> HIVE-9069.14.patch, HIVE-9069.14.patch, HIVE-9069.patch
>
>
> Simplify predicates for disjunctive predicates so that can get pushed down to 
> the scan.
> Looks like this is still an issue, some of the filters can be pushed down to 
> the scan.
> {code}
> set hive.cbo.enable=true
> set hive.stats.fetch.column.stats=true
> set hive.exec.dynamic.partition.mode=nonstrict
> set hive.tez.auto.reducer.parallelism=true
> set hive.auto.convert.join.noconditionaltask.size=32000
> set hive.exec.reducers.bytes.per.reducer=1
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
> set hive.support.concurrency=false
> set hive.tez.exec.print.summary=true
> explain  
> select  substr(r_reason_desc,1,20) as r
>,avg(ws_quantity) wq
>,avg(wr_refunded_cash) ref
>,avg(wr_fee) fee
>  from web_sales, web_returns, web_page, customer_demographics cd1,
>   customer_demographics cd2, customer_address, date_dim, reason 
>  where web_sales.ws_web_page_sk = web_page.wp_web_page_sk
>and web_sales.ws_item_sk = web_returns.wr_item_sk
>and web_sales.ws_order_number = web_returns.wr_order_number
>and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998
>and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk 
>and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
>and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
>and reason.r_reason_sk = web_returns.wr_reason_sk
>and
>(
> (
>  cd1.cd_marital_status = 'M'
>  and
>  cd1.cd_marital_status = cd2.cd_marital_status
>  and
>  cd1.cd_education_status = '4 yr Degree'
>  and 
>  cd1.cd_education_status = cd2.cd_education_status
>  and
>  ws_sales_price between 100.00 and 150.00
> )
>or
> (
>  cd1.cd_marital_status = 'D'
>  and
>  cd1.cd_marital_status = cd2.cd_marital_status
>  and
>  cd1.cd_education_status = 'Primary' 
>  and
>  cd1.cd_education_status = cd2.cd_education_status
>  and
>  ws_sales_price between 50.00 and 100.00
> )
>or
> (
>  cd1.cd_marital_status = 'U'
>  and
>  cd1.cd_marital_status = cd2.cd_marital_status
>  and
>  cd1.cd_education_status = 'Advanced Degree'
>  and
>  cd1.cd_education_status = cd2.cd_education_status
>  and
>  ws_sales_price between 150.00 and 200.00
> )
>)
>and
>(
> (
>  ca_country = 'United States'
>  and
>  ca_state in ('KY', 'GA', 'NM')
>  and ws_net_profit between 100 and 200  
> )
> or
> (
>  ca_country = 'United States'
>  and
>  ca_state in ('MT', 'OR', 'IN')
>  and ws_net_profit between 150 and 300  
> )
> or
> (
>  ca_country = 'United States'
>  and
>  ca_state in ('WI', 'MO', 'WV')
>  and ws_net_profit between 50 and 250  
> )
>)
> group by r_reason_desc
> order by r, wq, ref, fee
> limit 100
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 9 <- Map 1 (BROADCAST_EDGE)
> Reducer 3 <- Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
> Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 
> (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE)
> Reducer 7 <- Reducer 6 (SIMPLE_EDGE)
> Reducer 8 <- Reducer 7 (SIMPLE_EDGE)
>   DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: web_page
>   filterExpr: wp_web_page_sk is not null (type: boolean)
>   Statistics: Num rows: 4602 Data size: 2696178 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Filter Operator
>  

[jira] [Commented] (HIVE-10778) LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2

2015-05-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560010#comment-14560010
 ] 

Sergey Shelukhin commented on HIVE-10778:
-

I am clearing the map after build rather than just removing cacheMapWork/etc. 
parts pertaining to global map, in case I missed somewhere during build that it 
could be used.

> LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2
> 
>
> Key: HIVE-10778
> URL: https://issues.apache.org/jira/browse/HIVE-10778
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: llap
>
> Attachments: HIVE-10778.01.patch, HIVE-10778.patch, llap-hs2-heap.png
>
>
> 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2.
> !llap-hs2-heap.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10778) LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2

2015-05-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560009#comment-14560009
 ] 

Sergey Shelukhin commented on HIVE-10778:
-

[~vikram.dixit] can you take a look?

> LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2
> 
>
> Key: HIVE-10778
> URL: https://issues.apache.org/jira/browse/HIVE-10778
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: llap
>
> Attachments: HIVE-10778.01.patch, HIVE-10778.patch, llap-hs2-heap.png
>
>
> 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2.
> !llap-hs2-heap.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9105) Hive-0.13 select constant in union all followed by group by gives wrong result

2015-05-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-9105.
---
Resolution: Fixed

> Hive-0.13 select constant in union all followed by group by gives wrong result
> --
>
> Key: HIVE-9105
> URL: https://issues.apache.org/jira/browse/HIVE-9105
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> select '1' as key from srcpart where ds="2008-04-09"
> UNION all
> SELECT key from srcpart where ds="2008-04-09" and hr="11"
> ) tab group by key 
> will generate wrong results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10788) Change sort_array to support non-primitive types

2015-05-26 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-10788:

Attachment: HIVE-10788.1.patch

Like HIVE-10427, UNION type is a little bit tricky to support.
Will make that as a follow-up JIRA.

> Change sort_array to support non-primitive types
> 
>
> Key: HIVE-10788
> URL: https://issues.apache.org/jira/browse/HIVE-10788
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-10788.1.patch
>
>
> Currently {{sort_array}} only support primitive types. As we already support 
> comparison between non-primitive types, it makes sense to remove this 
> restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories

2015-05-26 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559982#comment-14559982
 ] 

Swarnim Kulkarni commented on HIVE-10809:
-

[~selinazh] Minor feedback:

1. Instead of throwing so many exceptions in the throws, could simply add in a 
throws Exception to make the test simpler.
2. To make the test stronger, any way we can test that the directories actually 
existed before the query ran?


> HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
> --
>
> Key: HIVE-10809
> URL: https://issues.apache.org/jira/browse/HIVE-10809
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Selina Zhang
>Assignee: Selina Zhang
> Attachments: HIVE-10809.1.patch, HIVE-10809.2.patch
>
>
> When static partition is added through HCatStorer or HCatWriter
> {code}
> JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader();
> STORE JoinedData INTO 'selina.joined_events_e' USING 
> org.apache.hive.hcatalog.pig.HCatStorer('author=selina');
> {code}
> The table directory looks like
> {noformat}
> drwx--   - selinaz users  0 2015-05-22 21:19 
> /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798
> drwx--   - selinaz users  0 2015-05-22 21:19 
> /user/selinaz/joined_events_e/author=selina
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases

2015-05-26 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559967#comment-14559967
 ] 

Jesus Camacho Rodriguez commented on HIVE-10811:


The method {{trimChild}} trims the child columns keeping 1) the columns needed 
from the parent ("fieldsUsed"), and 2) the columns on which collations were 
specified. Currently, the method takes the collations from the parent relation 
"rel", which seems incorrect as we end up referencing column positions that do 
not exist in the child "input". Thus, I changed the method to take the 
collations from the relation on which we are pruning the columns i.e. "input".

> RelFieldTrimmer throws NoSuchElementException in some cases
> ---
>
> Key: HIVE-10811
> URL: https://issues.apache.org/jira/browse/HIVE-10811
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, 
> HIVE-10811.patch
>
>
> RelFieldTrimmer runs into NoSuchElementException in some cases.
> Stack trace:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Internal error: While 
> invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
>   at org.apache.calcite.util.Util.newInternal(Util.java:743)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536)
>   ...

[jira] [Commented] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled

2015-05-26 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559918#comment-14559918
 ] 

Matt McCline commented on HIVE-10244:
-

Ya, I know, that is what I thought.  But the new prune flag seems to be on in 
the Reducer even though isGroupingSetsPresent is false.  We should talk to the 
author and reviewer of the change.

Jedi Master [~ashutoshc], can you explain to us Padawan Learners 
[~jpullokkaran] [~mmccline] [~jcamachorodriguez] all about the prune flag?

> Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when 
> hive.vectorized.execution.reduce.enabled is enabled
> ---
>
> Key: HIVE-10244
> URL: https://issues.apache.org/jira/browse/HIVE-10244
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Matt McCline
> Attachments: HIVE-10244.01.patch, explain_q80_vectorized_reduce_on.txt
>
>
> Query 
> {code}
> set hive.vectorized.execution.reduce.enabled=true;
> with ssr as
>  (select  s_store_id as store_id,
>   sum(ss_ext_sales_price) as sales,
>   sum(coalesce(sr_return_amt, 0)) as returns,
>   sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit
>   from store_sales left outer join store_returns on
>  (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number),
>  date_dim,
>  store,
>  item,
>  promotion
>  where ss_sold_date_sk = d_date_sk
>and d_date between cast('1998-08-04' as date) 
>   and (cast('1998-09-04' as date))
>and ss_store_sk = s_store_sk
>and ss_item_sk = i_item_sk
>and i_current_price > 50
>and ss_promo_sk = p_promo_sk
>and p_channel_tv = 'N'
>  group by s_store_id)
>  ,
>  csr as
>  (select  cp_catalog_page_id as catalog_page_id,
>   sum(cs_ext_sales_price) as sales,
>   sum(coalesce(cr_return_amount, 0)) as returns,
>   sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit
>   from catalog_sales left outer join catalog_returns on
>  (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number),
>  date_dim,
>  catalog_page,
>  item,
>  promotion
>  where cs_sold_date_sk = d_date_sk
>and d_date between cast('1998-08-04' as date)
>   and (cast('1998-09-04' as date))
> and cs_catalog_page_sk = cp_catalog_page_sk
>and cs_item_sk = i_item_sk
>and i_current_price > 50
>and cs_promo_sk = p_promo_sk
>and p_channel_tv = 'N'
> group by cp_catalog_page_id)
>  ,
>  wsr as
>  (select  web_site_id,
>   sum(ws_ext_sales_price) as sales,
>   sum(coalesce(wr_return_amt, 0)) as returns,
>   sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit
>   from web_sales left outer join web_returns on
>  (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number),
>  date_dim,
>  web_site,
>  item,
>  promotion
>  where ws_sold_date_sk = d_date_sk
>and d_date between cast('1998-08-04' as date)
>   and (cast('1998-09-04' as date))
> and ws_web_site_sk = web_site_sk
>and ws_item_sk = i_item_sk
>and i_current_price > 50
>and ws_promo_sk = p_promo_sk
>and p_channel_tv = 'N'
> group by web_site_id)
>   select  channel
> , id
> , sum(sales) as sales
> , sum(returns) as returns
> , sum(profit) as profit
>  from 
>  (select 'store channel' as channel
> , concat('store', store_id) as id
> , sales
> , returns
> , profit
>  from   ssr
>  union all
>  select 'catalog channel' as channel
> , concat('catalog_page', catalog_page_id) as id
> , sales
> , returns
> , profit
>  from  csr
>  union all
>  select 'web channel' as channel
> , concat('web_site', web_site_id) as id
> , sales
> , returns
> , profit
>  from   wsr
>  ) x
>  group by channel, id with rollup
>  order by channel
>  ,id
>  limit 100
> {code}
> Exception 
> {code}
> Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, 
> diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing vector batch (tag=0) 
> \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8
> \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7
> \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8
>   at 
> org.

[jira] [Updated] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories

2015-05-26 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10809:

Attachment: HIVE-10809.2.patch

The above unit test failures seem not relevant to this patch. 

Uploaded a new patch. Add verification in TestHCatStorer to verify the scratch 
directories are removed. 





> HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
> --
>
> Key: HIVE-10809
> URL: https://issues.apache.org/jira/browse/HIVE-10809
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Selina Zhang
>Assignee: Selina Zhang
> Attachments: HIVE-10809.1.patch, HIVE-10809.2.patch
>
>
> When static partition is added through HCatStorer or HCatWriter
> {code}
> JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader();
> STORE JoinedData INTO 'selina.joined_events_e' USING 
> org.apache.hive.hcatalog.pig.HCatStorer('author=selina');
> {code}
> The table directory looks like
> {noformat}
> drwx--   - selinaz users  0 2015-05-22 21:19 
> /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798
> drwx--   - selinaz users  0 2015-05-22 21:19 
> /user/selinaz/joined_events_e/author=selina
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10753) hs2 jdbc url - wrong connection string cause error on beeline/jdbc/odbc client, misleading message

2015-05-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559893#comment-14559893
 ] 

Thejas M Nair commented on HIVE-10753:
--

+1

> hs2 jdbc url - wrong connection string cause  error on beeline/jdbc/odbc 
> client, misleading message
> ---
>
> Key: HIVE-10753
> URL: https://issues.apache.org/jira/browse/HIVE-10753
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, JDBC
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-10753.1.patch, HIVE-10753.2.patch
>
>
> {noformat}
> beeline -u 
> 'jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http' -n 
> hdiuser
> scan complete in 15ms
> Connecting to 
> jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http
> Java heap space
> Beeline version 0.14.0.2.2.4.1-1 by Apache Hive
> 0: jdbc:hive2://localhost:10001/default (closed)> ^Chdiuser@headnode0:~$ 
> But it works if I use the deprecated param - 
> hdiuser@headnode0:~$ beeline -u 
> 'jdbc:hive2://localhost:10001/default?hive.server2.transport.mode=http;httpPath=/'
>  -n hdiuser
> scan complete in 12ms
> Connecting to 
> jdbc:hive2://localhost:10001/default?hive.server2.transport.mode=http;httpPath=/
> 15/04/28 23:16:46 [main]: WARN jdbc.Utils: * JDBC param deprecation *
> 15/04/28 23:16:46 [main]: WARN jdbc.Utils: The use of 
> hive.server2.transport.mode is deprecated.
> 15/04/28 23:16:46 [main]: WARN jdbc.Utils: Please use transportMode like so: 
> jdbc:hive2://:/dbName;transportMode=
> Connected to: Apache Hive (version 0.14.0.2.2.4.1-1)
> Driver: Hive JDBC (version 0.14.0.2.2.4.1-1)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 0.14.0.2.2.4.1-1 by Apache Hive
> 0: jdbc:hive2://localhost:10001/default> show tables;
> +--+--+
> | tab_name |
> +--+--+
> | hivesampletable  |
> +--+--+
> 1 row selected (18.181 seconds)
> 0: jdbc:hive2://localhost:10001/default> ^Chdiuser@headnode0:~$ ^C
> {noformat}
> The reason for the above message is :
> The url is wrong. Correct one:
> {code}
> beeline -u 
> 'jdbc:hive2://localhost:10001/default;httpPath=/;transportMode=http' -n 
> hdiuser
> {code}
> Note the ";" instead of "?". The deprecation msg prints the format as well: 
> {code}
> Please use transportMode like so: 
> jdbc:hive2://:/dbName;transportMode=
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-05-26 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559879#comment-14559879
 ] 

Alan Gates commented on HIVE-10165:
---

I'll review if someone else doesn't get to it first.  It will take me a few 
days to get to it as I'm out the rest of this week.

As far as the failing tests, the 5 earlier failures didn't look related to your 
patch.  Unless we really broke the trunk it's surprising to see 600+ test 
failures for your later patch.  Have you tried running some of these locally to 
see whether you can reproduce them?

> Improve hive-hcatalog-streaming extensibility and support updates and deletes.
> --
>
> Key: HIVE-10165
> URL: https://issues.apache.org/jira/browse/HIVE-10165
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: streaming_api
> Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, 
> HIVE-10165.5.patch
>
>
> h3. Overview
> I'd like to extend the 
> [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
>  API so that it also supports the writing of record updates and deletes in 
> addition to the already supported inserts.
> h3. Motivation
> We have many Hadoop processes outside of Hive that merge changed facts into 
> existing datasets. Traditionally we achieve this by: reading in a 
> ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
> sequence and then applying a function to determine inserted, updated, and 
> deleted rows. However, in our current scheme we must rewrite all partitions 
> that may potentially contain changes. In practice the number of mutated 
> records is very small when compared with the records contained in a 
> partition. This approach results in a number of operational issues:
> * Excessive amount of write activity required for small data changes.
> * Downstream applications cannot robustly read these datasets while they are 
> being updated.
> * Due to scale of the updates (hundreds or partitions) the scope for 
> contention is high. 
> I believe we can address this problem by instead writing only the changed 
> records to a Hive transactional table. This should drastically reduce the 
> amount of data that we need to write and also provide a means for managing 
> concurrent access to the data. Our existing merge processes can read and 
> retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
> an updated form of the hive-hcatalog-streaming API which will then have the 
> required data to perform an update or insert in a transactional manner. 
> h3. Benefits
> * Enables the creation of large-scale dataset merge processes  
> * Opens up Hive transactional functionality in an accessible manner to 
> processes that operate outside of Hive.
> h3. Implementation
> Our changes do not break the existing API contracts. Instead our approach has 
> been to consider the functionality offered by the existing API and our 
> proposed API as fulfilling separate and distinct use-cases. The existing API 
> is primarily focused on the task of continuously writing large volumes of new 
> data into a Hive table for near-immediate analysis. Our use-case however, is 
> concerned more with the frequent but not continuous ingestion of mutations to 
> a Hive table from some ETL merge process. Consequently we feel it is 
> justifiable to add our new functionality via an alternative set of public 
> interfaces and leave the existing API as is. This keeps both APIs clean and 
> focused at the expense of presenting additional options to potential users. 
> Wherever possible, shared implementation concerns have been factored out into 
> abstract base classes that are open to third-party extension. A detailed 
> breakdown of the changes is as follows:
> * We've introduced a public {{RecordMutator}} interface whose purpose is to 
> expose insert/update/delete operations to the user. This is a counterpart to 
> the write-only {{RecordWriter}}. We've also factored out life-cycle methods 
> common to these two interfaces into a super {{RecordOperationWriter}} 
> interface.  Note that the row representation has be changed from {{byte[]}} 
> to {{Object}}. Within our data processing jobs our records are often 
> available in a strongly typed and decoded form such as a POJO or a Tuple 
> object. Therefore is seems to make sense that we are able to pass this 
> through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} 
> encoding step. This of course still allows users to use {{byte[]}} if they 
> wish.
> * The introduction of {{RecordMutator}} requires that insert/update/delete 
> operations

[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-05-26 Thread Elliot West (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559866#comment-14559866
 ] 

Elliot West commented on HIVE-10165:


I'm not quite sure what to do next. I have a '-1' because some (unrelated) 
tests fail. However I (perhaps naïvely) don't believe this is connected to my 
patch. Could someone please review?

> Improve hive-hcatalog-streaming extensibility and support updates and deletes.
> --
>
> Key: HIVE-10165
> URL: https://issues.apache.org/jira/browse/HIVE-10165
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: streaming_api
> Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, 
> HIVE-10165.5.patch
>
>
> h3. Overview
> I'd like to extend the 
> [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
>  API so that it also supports the writing of record updates and deletes in 
> addition to the already supported inserts.
> h3. Motivation
> We have many Hadoop processes outside of Hive that merge changed facts into 
> existing datasets. Traditionally we achieve this by: reading in a 
> ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
> sequence and then applying a function to determine inserted, updated, and 
> deleted rows. However, in our current scheme we must rewrite all partitions 
> that may potentially contain changes. In practice the number of mutated 
> records is very small when compared with the records contained in a 
> partition. This approach results in a number of operational issues:
> * Excessive amount of write activity required for small data changes.
> * Downstream applications cannot robustly read these datasets while they are 
> being updated.
> * Due to scale of the updates (hundreds or partitions) the scope for 
> contention is high. 
> I believe we can address this problem by instead writing only the changed 
> records to a Hive transactional table. This should drastically reduce the 
> amount of data that we need to write and also provide a means for managing 
> concurrent access to the data. Our existing merge processes can read and 
> retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
> an updated form of the hive-hcatalog-streaming API which will then have the 
> required data to perform an update or insert in a transactional manner. 
> h3. Benefits
> * Enables the creation of large-scale dataset merge processes  
> * Opens up Hive transactional functionality in an accessible manner to 
> processes that operate outside of Hive.
> h3. Implementation
> Our changes do not break the existing API contracts. Instead our approach has 
> been to consider the functionality offered by the existing API and our 
> proposed API as fulfilling separate and distinct use-cases. The existing API 
> is primarily focused on the task of continuously writing large volumes of new 
> data into a Hive table for near-immediate analysis. Our use-case however, is 
> concerned more with the frequent but not continuous ingestion of mutations to 
> a Hive table from some ETL merge process. Consequently we feel it is 
> justifiable to add our new functionality via an alternative set of public 
> interfaces and leave the existing API as is. This keeps both APIs clean and 
> focused at the expense of presenting additional options to potential users. 
> Wherever possible, shared implementation concerns have been factored out into 
> abstract base classes that are open to third-party extension. A detailed 
> breakdown of the changes is as follows:
> * We've introduced a public {{RecordMutator}} interface whose purpose is to 
> expose insert/update/delete operations to the user. This is a counterpart to 
> the write-only {{RecordWriter}}. We've also factored out life-cycle methods 
> common to these two interfaces into a super {{RecordOperationWriter}} 
> interface.  Note that the row representation has be changed from {{byte[]}} 
> to {{Object}}. Within our data processing jobs our records are often 
> available in a strongly typed and decoded form such as a POJO or a Tuple 
> object. Therefore is seems to make sense that we are able to pass this 
> through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} 
> encoding step. This of course still allows users to use {{byte[]}} if they 
> wish.
> * The introduction of {{RecordMutator}} requires that insert/update/delete 
> operations are then also exposed on a {{TransactionBatch}} type. We've done 
> this with the introduction of a public {{MutatorTransactionBatch}} interface 
> which is a counterpart to the write-only {{TransactionBatch}}

[jira] [Commented] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases

2015-05-26 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559854#comment-14559854
 ] 

Laljo John Pullokkaran commented on HIVE-10811:
---

[~jcamachorodriguez] I don't get the patch.
Shouldn't we be checking collations from "rel" present in "input"?

> RelFieldTrimmer throws NoSuchElementException in some cases
> ---
>
> Key: HIVE-10811
> URL: https://issues.apache.org/jira/browse/HIVE-10811
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, 
> HIVE-10811.patch
>
>
> RelFieldTrimmer runs into NoSuchElementException in some cases.
> Stack trace:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Internal error: While 
> invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
>   at org.apache.calcite.util.Util.newInternal(Util.java:743)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536)
>   ... 32 more
> Caused by: java.lang.AssertionError: Internal error: While invoking method 
> 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
>   at org.apache.calcite.util.Util.newInternal(Util.java:

[jira] [Commented] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity

2015-05-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559849#comment-14559849
 ] 

Hive QA commented on HIVE-7723:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735389/HIVE-7723.11.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4046/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4046/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4046/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java:
 Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
spark-client ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ spark-client ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp/conf
 [copy] Copying 11 files to 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
spark-client ---
[INFO] Compiling 5 source files to 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/test-classes
[INFO] 
[INFO] --- maven-dependency-plugin:2.8:copy (copy-guava-14) @ spark-client ---
[INFO] Configured Artifact: com.google.guava:guava:14.0.1:jar
[INFO] Copying guava-14.0.1.jar to 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/dependency/guava-14.0.1.jar
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ spark-client ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ spark-client ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-1.3.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
spark-client ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ spark-client ---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-1.3.0-SNAPSHOT.jar
 to 
/home/hiveptest/.m2/repository/org/apache/hive/spark-client/1.3.0-SNAPSHOT/spark-client-1.3.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/spark-client/pom.xml to 
/home/hiveptest/.m2/repository/org/apache/hive/spark-client/1.3.0-SNAPSHOT/spark-client-1.3.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Query Language 1.3.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-exec ---
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql/target
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-exec ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (generate-sources) @ hive-exec ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-test-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen
Generating vector expression code
Generating vector expression test code
[INFO] Executed tasks
[INFO] 
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-exec ---
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/src/gen/protobuf/gen-java
 added.
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/src/gen/thrift/gen-javabean
 added.
[INFO] So

[jira] [Commented] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation

2015-05-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559839#comment-14559839
 ] 

Hive QA commented on HIVE-10812:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735375/HIVE-10812.03.patch

{color:green}SUCCESS:{color} +1 8974 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4045/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4045/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4045/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735375 - PreCommit-HIVE-TRUNK-Build

> Scaling PK/FK's selectivity for stats annotation
> 
>
> Key: HIVE-10812
> URL: https://issues.apache.org/jira/browse/HIVE-10812
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10812.01.patch, HIVE-10812.02.patch, 
> HIVE-10812.03.patch
>
>
> Right now, the computation of the selectivity of FK side based on PK side 
> does not take into consideration of the range of FK and the range of PK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10777) LLAP: add pre-fragment and per-table cache details

2015-05-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10777.
-

committed to branch

> LLAP: add pre-fragment and per-table cache details
> --
>
> Key: HIVE-10777
> URL: https://issues.apache.org/jira/browse/HIVE-10777
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
> Attachments: HIVE-10777.01.patch, HIVE-10777.02.patch, 
> HIVE-10777.WIP.patch, HIVE-10777.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10808) Inner join on Null throwing Cast Exception

2015-05-26 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559800#comment-14559800
 ] 

Swarnim Kulkarni commented on HIVE-10808:
-

Sounds great. Easier to review patches with tests on it which guarantee that 
the patch actually works ;)

> Inner join on Null throwing Cast Exception
> --
>
> Key: HIVE-10808
> URL: https://issues.apache.org/jira/browse/HIVE-10808
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.1
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Critical
> Attachments: HIVE-10808.patch
>
>
> select
> > a.col1,
> > a.col2,
> > a.col3,
> > a.col4
> > from
> > tab1 a
> > inner join
> > (
> > select
> > max(x) as x
> > from
> > tab1
> > where
> > x < 20130327
> > ) r
> > on
> > a.x = r.x
> > where
> > a.col1 = 'F'
> > and a.col3 in ('A', 'S', 'G');
> Failed Task log snippet:
> 2015-05-18 19:22:17,372 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: 
> __MAP_PLAN__
> 2015-05-18 19:22:17,372 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring cache key: 
> __MAP_PLAN__
> 2015-05-18 19:22:17,457 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: Error in configuring 
> object
> at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
> ... 9 more
> Caused by: java.lang.RuntimeException: Error in configuring object
> at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
> ... 14 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
> ... 17 more
> Caused by: java.lang.RuntimeException: Map operator initialization failed
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:157)
> ... 22 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.NullStructSerDe$NullStructSerDeObjectInspector 
> cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:334)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:352)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126)
> ... 22 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.NullStructSerDe$NullStructSerDeObjectInspector 
> cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1149)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219)
> at 
> org.apache.hadoop.hive.serde2.objectinspec

[jira] [Resolved] (HIVE-10653) LLAP: registry logs strange lines on daemons

2015-05-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10653.
-
   Resolution: Fixed
Fix Version/s: llap

committed to branch

> LLAP: registry logs strange lines on daemons
> 
>
> Key: HIVE-10653
> URL: https://issues.apache.org/jira/browse/HIVE-10653
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>
> Discovered while looking at HIVE-10648; [~sseth] mentioned that this should 
> not be happening.
> Most of the daemons described as being killed were actually alive. 
> Several/all LLAP daemons in the cluster logged these messages at 
> approximately the same time (while AM was stuck, incidentally; perhaps they 
> were just bored with no work).
> {noformat}
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: 
> Starting to refresh ServiceInstanceSet 515383300
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker f698eaee-bf6c-484d-9b90-a60d9005760c which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn057-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 9d1f50d1-f237-43c1-a8c5-32741e82d18b which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn041-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker b8a22e2f-652a-4fde-be7a-744786bc93c9 which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn042-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 8394e271-e0d5-4589-817e-0181db0866b9 which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn056-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 1cabdcce-1089-4de6-abdf-315f18a8b4c0 which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn054-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 4027ad61-8c61-4173-90e2-d166ceaad74b which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn051-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 7f71a05f-f849-43d2-8fdb-09ba144d4b93 which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn050-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 41835ca1-69cd-4290-8c8f-8a9583a5d635 which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn053-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 54952e48-41be-48e1-922c-a39d0ee48a33 which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn055-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 980dfe6c-d03b-462b-bee3-35d183c74aee which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn052-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker d524212a-6743-4f18-bcf6-525a0d4b1a0a which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn046-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: 
> Killing service instance: DynamicServiceInstance [alive=true, 
> host=cn048-10.l42scl.hortonworks.com:15001 with resources= vCores:6

[jira] [Commented] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled

2015-05-26 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559797#comment-14559797
 ] 

Laljo John Pullokkaran commented on HIVE-10244:
---

[~mmccline] How can you end up grouping id without grouping sets?
Language prevents referring to grouping id without grouping sets.

If grouping sets are present then previous line should bail out right?

if (desc.isGroupingSetsPresent()) {
  LOG.info("Grouping sets not supported in vector mode");
  return false;
}

> Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when 
> hive.vectorized.execution.reduce.enabled is enabled
> ---
>
> Key: HIVE-10244
> URL: https://issues.apache.org/jira/browse/HIVE-10244
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Matt McCline
> Attachments: HIVE-10244.01.patch, explain_q80_vectorized_reduce_on.txt
>
>
> Query 
> {code}
> set hive.vectorized.execution.reduce.enabled=true;
> with ssr as
>  (select  s_store_id as store_id,
>   sum(ss_ext_sales_price) as sales,
>   sum(coalesce(sr_return_amt, 0)) as returns,
>   sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit
>   from store_sales left outer join store_returns on
>  (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number),
>  date_dim,
>  store,
>  item,
>  promotion
>  where ss_sold_date_sk = d_date_sk
>and d_date between cast('1998-08-04' as date) 
>   and (cast('1998-09-04' as date))
>and ss_store_sk = s_store_sk
>and ss_item_sk = i_item_sk
>and i_current_price > 50
>and ss_promo_sk = p_promo_sk
>and p_channel_tv = 'N'
>  group by s_store_id)
>  ,
>  csr as
>  (select  cp_catalog_page_id as catalog_page_id,
>   sum(cs_ext_sales_price) as sales,
>   sum(coalesce(cr_return_amount, 0)) as returns,
>   sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit
>   from catalog_sales left outer join catalog_returns on
>  (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number),
>  date_dim,
>  catalog_page,
>  item,
>  promotion
>  where cs_sold_date_sk = d_date_sk
>and d_date between cast('1998-08-04' as date)
>   and (cast('1998-09-04' as date))
> and cs_catalog_page_sk = cp_catalog_page_sk
>and cs_item_sk = i_item_sk
>and i_current_price > 50
>and cs_promo_sk = p_promo_sk
>and p_channel_tv = 'N'
> group by cp_catalog_page_id)
>  ,
>  wsr as
>  (select  web_site_id,
>   sum(ws_ext_sales_price) as sales,
>   sum(coalesce(wr_return_amt, 0)) as returns,
>   sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit
>   from web_sales left outer join web_returns on
>  (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number),
>  date_dim,
>  web_site,
>  item,
>  promotion
>  where ws_sold_date_sk = d_date_sk
>and d_date between cast('1998-08-04' as date)
>   and (cast('1998-09-04' as date))
> and ws_web_site_sk = web_site_sk
>and ws_item_sk = i_item_sk
>and i_current_price > 50
>and ws_promo_sk = p_promo_sk
>and p_channel_tv = 'N'
> group by web_site_id)
>   select  channel
> , id
> , sum(sales) as sales
> , sum(returns) as returns
> , sum(profit) as profit
>  from 
>  (select 'store channel' as channel
> , concat('store', store_id) as id
> , sales
> , returns
> , profit
>  from   ssr
>  union all
>  select 'catalog channel' as channel
> , concat('catalog_page', catalog_page_id) as id
> , sales
> , returns
> , profit
>  from  csr
>  union all
>  select 'web channel' as channel
> , concat('web_site', web_site_id) as id
> , sales
> , returns
> , profit
>  from   wsr
>  ) x
>  group by channel, id with rollup
>  order by channel
>  ,id
>  limit 100
> {code}
> Exception 
> {code}
> Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, 
> diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing vector batch (tag=0) 
> \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8
> \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7
> \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8
>

[jira] [Commented] (HIVE-10808) Inner join on Null throwing Cast Exception

2015-05-26 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559785#comment-14559785
 ] 

Naveen Gangam commented on HIVE-10808:
--

[~swarnim] Agreed. However, we received this stack trace from a customer that 
can no longer reproduce the issue( their infra underwent some 
changes/upgrades). We have not been able to reproduce this using a test 
dataset. If I am able to reproduce this more consistently, I can create a unit 
test for this. Fair?

> Inner join on Null throwing Cast Exception
> --
>
> Key: HIVE-10808
> URL: https://issues.apache.org/jira/browse/HIVE-10808
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.1
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Critical
> Attachments: HIVE-10808.patch
>
>
> select
> > a.col1,
> > a.col2,
> > a.col3,
> > a.col4
> > from
> > tab1 a
> > inner join
> > (
> > select
> > max(x) as x
> > from
> > tab1
> > where
> > x < 20130327
> > ) r
> > on
> > a.x = r.x
> > where
> > a.col1 = 'F'
> > and a.col3 in ('A', 'S', 'G');
> Failed Task log snippet:
> 2015-05-18 19:22:17,372 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: 
> __MAP_PLAN__
> 2015-05-18 19:22:17,372 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring cache key: 
> __MAP_PLAN__
> 2015-05-18 19:22:17,457 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: Error in configuring 
> object
> at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
> ... 9 more
> Caused by: java.lang.RuntimeException: Error in configuring object
> at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
> ... 14 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
> ... 17 more
> Caused by: java.lang.RuntimeException: Map operator initialization failed
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:157)
> ... 22 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.NullStructSerDe$NullStructSerDeObjectInspector 
> cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:334)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:352)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126)
> ... 22 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.NullStructSerDe$NullStructSerDeObjectInspector 
> cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectIns

[jira] [Assigned] (HIVE-10653) LLAP: registry logs strange lines on daemons

2015-05-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-10653:
---

Assignee: Sergey Shelukhin  (was: Gopal V)

> LLAP: registry logs strange lines on daemons
> 
>
> Key: HIVE-10653
> URL: https://issues.apache.org/jira/browse/HIVE-10653
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> Discovered while looking at HIVE-10648; [~sseth] mentioned that this should 
> not be happening.
> Most of the daemons described as being killed were actually alive. 
> Several/all LLAP daemons in the cluster logged these messages at 
> approximately the same time (while AM was stuck, incidentally; perhaps they 
> were just bored with no work).
> {noformat}
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: 
> Starting to refresh ServiceInstanceSet 515383300
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker f698eaee-bf6c-484d-9b90-a60d9005760c which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn057-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 9d1f50d1-f237-43c1-a8c5-32741e82d18b which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn041-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker b8a22e2f-652a-4fde-be7a-744786bc93c9 which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn042-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 8394e271-e0d5-4589-817e-0181db0866b9 which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn056-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 1cabdcce-1089-4de6-abdf-315f18a8b4c0 which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn054-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 4027ad61-8c61-4173-90e2-d166ceaad74b which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn051-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 7f71a05f-f849-43d2-8fdb-09ba144d4b93 which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn050-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 41835ca1-69cd-4290-8c8f-8a9583a5d635 which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn053-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 54952e48-41be-48e1-922c-a39d0ee48a33 which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn055-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker 980dfe6c-d03b-462b-bee3-35d183c74aee which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn052-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding 
> new worker d524212a-6743-4f18-bcf6-525a0d4b1a0a which mapped to 
> DynamicServiceInstance [alive=true, 
> host=cn046-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO 
> org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: 
> Killing service instance: DynamicServiceInstance [alive=true, 
> host=cn048-10.l42scl.hortonworks.com:15001 with resources= vCores:6>]
> 2015-05-07 12:14:30,017 [LlapYarnRegistryRe

[jira] [Updated] (HIVE-10777) LLAP: add pre-fragment and per-table cache details

2015-05-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10777:

Attachment: HIVE-10777.02.patch

Updated the name of the config setting

> LLAP: add pre-fragment and per-table cache details
> --
>
> Key: HIVE-10777
> URL: https://issues.apache.org/jira/browse/HIVE-10777
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
> Attachments: HIVE-10777.01.patch, HIVE-10777.02.patch, 
> HIVE-10777.WIP.patch, HIVE-10777.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem

2015-05-26 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559731#comment-14559731
 ] 

Mostafa Mokhtar commented on HIVE-10711:


Yes, please.



> Tez HashTableLoader attempts to allocate more memory than available when 
> HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
> --
>
> Key: HIVE-10711
> URL: https://issues.apache.org/jira/browse/HIVE-10711
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, 
> HIVE-10711.3.patch, HIVE-10711.4.patch
>
>
> Tez HashTableLoader bases its memory allocation on 
> HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the 
> process max memory then this can result in the HashTableLoader trying to use 
> more memory than available to the process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem

2015-05-26 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559732#comment-14559732
 ] 

Mostafa Mokhtar commented on HIVE-10711:


Yes, please.



> Tez HashTableLoader attempts to allocate more memory than available when 
> HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
> --
>
> Key: HIVE-10711
> URL: https://issues.apache.org/jira/browse/HIVE-10711
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, 
> HIVE-10711.3.patch, HIVE-10711.4.patch
>
>
> Tez HashTableLoader bases its memory allocation on 
> HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the 
> process max memory then this can result in the HashTableLoader trying to use 
> more memory than available to the process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10825) Add parquet branch profile to jenkins-submit-build.sh

2015-05-26 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559730#comment-14559730
 ] 

Szehon Ho commented on HIVE-10825:
--

+1

> Add parquet branch profile to jenkins-submit-build.sh
> -
>
> Key: HIVE-10825
> URL: https://issues.apache.org/jira/browse/HIVE-10825
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
> Attachments: HIVE-10825.1.patch
>
>
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem

2015-05-26 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559729#comment-14559729
 ] 

Alexander Pivovarov commented on HIVE-10711:


Mostafa, lets wait 24 hours before commit. Just to clarify. Do you want me to 
commit it to master and then do hotfix (cherry-pick) from master to branch-1.2?

> Tez HashTableLoader attempts to allocate more memory than available when 
> HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
> --
>
> Key: HIVE-10711
> URL: https://issues.apache.org/jira/browse/HIVE-10711
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, 
> HIVE-10711.3.patch, HIVE-10711.4.patch
>
>
> Tez HashTableLoader bases its memory allocation on 
> HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the 
> process max memory then this can result in the HashTableLoader trying to use 
> more memory than available to the process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10825) Add parquet branch profile to jenkins-submit-build.sh

2015-05-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-10825:
---
Description: NO PRECOMMIT TESTS  (was: NO PRECOMMIT TEST)

> Add parquet branch profile to jenkins-submit-build.sh
> -
>
> Key: HIVE-10825
> URL: https://issues.apache.org/jira/browse/HIVE-10825
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
> Attachments: HIVE-10825.1.patch
>
>
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10825) Add parquet branch profile to jenkins-submit-build.sh

2015-05-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-10825:
---
Description: NO PRECOMMIT TEST

> Add parquet branch profile to jenkins-submit-build.sh
> -
>
> Key: HIVE-10825
> URL: https://issues.apache.org/jira/browse/HIVE-10825
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
> Attachments: HIVE-10825.1.patch
>
>
> NO PRECOMMIT TEST



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10825) Add parquet branch profile to jenkins-submit-build.sh

2015-05-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-10825:
---
Attachment: HIVE-10825.1.patch

> Add parquet branch profile to jenkins-submit-build.sh
> -
>
> Key: HIVE-10825
> URL: https://issues.apache.org/jira/browse/HIVE-10825
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
> Attachments: HIVE-10825.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10793:

Fix Version/s: (was: 1.2.1)
   1.3.0

> Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
> 
>
> Key: HIVE-10793
> URL: https://issues.apache.org/jira/browse/HIVE-10793
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.3.0
>
> Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch
>
>
> HybridHashTableContainer will allocate memory based on estimate, which means 
> if the actual is less than the estimate the allocated memory won't be used.
> Number of partitions is calculated based on estimated data size
> {code}
> numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
> minNumParts, minWbSize,
>   nwayConf);
> {code}
> Then based on number of partitions writeBufferSize is set
> {code}
> writeBufferSize = (int)(estimatedTableSize / numPartitions);
> {code}
> Each hash partition will allocate 1 WriteBuffer, with no further allocation 
> if the estimate data size is correct.
> Suggested solution is to reduce writeBufferSize by a factor such that only X% 
> of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity

2015-05-26 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7723:
--
Attachment: HIVE-7723.11.patch

> Explain plan for complex query with lots of partitions is slow due to 
> in-efficient collection used to find a matching ReadEntity
> 
>
> Key: HIVE-7723
> URL: https://issues.apache.org/jira/browse/HIVE-7723
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Physical Optimizer
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, 
> HIVE-7723.11.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, 
> HIVE-7723.5.patch, HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, 
> HIVE-7723.9.patch
>
>
> Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
> showed that ReadEntity.equals is taking ~40% of the CPU.
> ReadEntity.equals is called from the snippet below.
> Again and again the set is iterated over to get the actual match, a HashMap 
> is a better option for this case as Set doesn't have a Get method.
> Also for ReadEntity equals is case-insensitive while hash is , which is an 
> undesired behavior.
> {code}
> public static ReadEntity addInput(Set inputs, ReadEntity 
> newInput) {
> // If the input is already present, make sure the new parent is added to 
> the input.
> if (inputs.contains(newInput)) {
>   for (ReadEntity input : inputs) {
> if (input.equals(newInput)) {
>   if ((newInput.getParents() != null) && 
> (!newInput.getParents().isEmpty())) {
> input.getParents().addAll(newInput.getParents());
> input.setDirect(input.isDirect() || newInput.isDirect());
>   }
>   return input;
> }
>   }
>   assert false;
> } else {
>   inputs.add(newInput);
>   return newInput;
> }
> // make compile happy
> return null;
>   }
> {code}
> This is the query used : 
> {code}
> select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
> ,cs1.b_streen_name ,cs1.b_city
>  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
> ,cs1.c_zip ,cs1.syear ,cs1.cnt
>  ,cs1.s1 ,cs1.s2 ,cs1.s3
>  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
> from
> (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
> store_name
>  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
> ,ad1.ca_street_name as b_streen_name
>  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
> c_street_number
>  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
> as c_zip
>  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
> as cnt
>  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
> ,sum(ss_coupon_amt) as s3
>   FROM   store_sales
> JOIN store_returns ON store_sales.ss_item_sk = 
> store_returns.sr_item_sk and store_sales.ss_ticket_number = 
> store_returns.sr_ticket_number
> JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
> JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
> JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
> JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
> JOIN store ON store_sales.ss_store_sk = store.s_store_sk
> JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
> cd1.cd_demo_sk
> JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
> cd2.cd_demo_sk
> JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
> JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
> hd1.hd_demo_sk
> JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
> hd2.hd_demo_sk
> JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
> ad1.ca_address_sk
> JOIN customer_address ad2 ON customer.c_current_addr_sk = 
> ad2.ca_address_sk
> JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
> JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
> JOIN item ON store_sales.ss_item_sk = item.i_item_sk
> JOIN
>  (select cs_item_sk
> ,sum(cs_ext_list_price) as 
> sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
>   from catalog_sales JOIN catalog_returns
>   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
> and catalog_sales.cs_order_number = catalog_returns.cr_order_number
>   group by cs_item_sk
>   having 
> sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
>  cs_ui
>

[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-26 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559704#comment-14559704
 ] 

Mostafa Mokhtar commented on HIVE-10793:


[~sushanth] [~sershe]
Can this go to 1.2.1 as well?



> Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
> 
>
> Key: HIVE-10793
> URL: https://issues.apache.org/jira/browse/HIVE-10793
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch
>
>
> HybridHashTableContainer will allocate memory based on estimate, which means 
> if the actual is less than the estimate the allocated memory won't be used.
> Number of partitions is calculated based on estimated data size
> {code}
> numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
> minNumParts, minWbSize,
>   nwayConf);
> {code}
> Then based on number of partitions writeBufferSize is set
> {code}
> writeBufferSize = (int)(estimatedTableSize / numPartitions);
> {code}
> Each hash partition will allocate 1 WriteBuffer, with no further allocation 
> if the estimate data size is correct.
> Suggested solution is to reduce writeBufferSize by a factor such that only X% 
> of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem

2015-05-26 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559699#comment-14559699
 ] 

Mostafa Mokhtar commented on HIVE-10711:


[~sushanth] FYI 

[~apivovarov]
Can you please commit the change to 1.2.1?

> Tez HashTableLoader attempts to allocate more memory than available when 
> HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
> --
>
> Key: HIVE-10711
> URL: https://issues.apache.org/jira/browse/HIVE-10711
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, 
> HIVE-10711.3.patch, HIVE-10711.4.patch
>
>
> Tez HashTableLoader bases its memory allocation on 
> HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the 
> process max memory then this can result in the HashTableLoader trying to use 
> more memory than available to the process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10819) SearchArgumentImpl for Timestamp is broken by HIVE-10286

2015-05-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559696#comment-14559696
 ] 

Sergey Shelukhin commented on HIVE-10819:
-

this breaks a lot of tests...

> SearchArgumentImpl for Timestamp is broken by HIVE-10286
> 
>
> Key: HIVE-10819
> URL: https://issues.apache.org/jira/browse/HIVE-10819
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 1.2.1
>
> Attachments: HIVE-10819.1.patch, HIVE-10819.2.patch
>
>
> The work around for kryo bug for Timestamp is accidentally removed by 
> HIVE-10286. Need to bring it back.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem

2015-05-26 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559667#comment-14559667
 ] 

Alexander Pivovarov commented on HIVE-10711:


+1

> Tez HashTableLoader attempts to allocate more memory than available when 
> HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
> --
>
> Key: HIVE-10711
> URL: https://issues.apache.org/jira/browse/HIVE-10711
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, 
> HIVE-10711.3.patch, HIVE-10711.4.patch
>
>
> Tez HashTableLoader bases its memory allocation on 
> HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the 
> process max memory then this can result in the HashTableLoader trying to use 
> more memory than available to the process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10749) Implement Insert ACID statement for parquet [Parquet branch]

2015-05-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-10749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-10749:
---
Attachment: HIVE-10749.2-parquet.patch

Re-attaching patch to allow jenkins job to execute tests on parquet branch

> Implement Insert ACID statement for parquet [Parquet branch]
> 
>
> Key: HIVE-10749
> URL: https://issues.apache.org/jira/browse/HIVE-10749
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10749.1.patch, HIVE-10749.1.patch, 
> HIVE-10749.2-parquet.patch, HIVE-10749.2.patch, HIVE-10749.patch
>
>
> We need to implement insert statement for parquet format like ORC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10749) Implement Insert ACID statement for parquet [Parquet branch]

2015-05-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-10749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-10749:
---
Summary: Implement Insert ACID statement for parquet [Parquet branch]  
(was: Implement Insert ACID statement for parquet)

> Implement Insert ACID statement for parquet [Parquet branch]
> 
>
> Key: HIVE-10749
> URL: https://issues.apache.org/jira/browse/HIVE-10749
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-10749.1.patch, HIVE-10749.1.patch, 
> HIVE-10749.2.patch, HIVE-10749.patch
>
>
> We need to implement insert statement for parquet format like ORC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem

2015-05-26 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559618#comment-14559618
 ] 

Mostafa Mokhtar commented on HIVE-10711:


[~apivovarov]
do you have anymore feedback?

> Tez HashTableLoader attempts to allocate more memory than available when 
> HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
> --
>
> Key: HIVE-10711
> URL: https://issues.apache.org/jira/browse/HIVE-10711
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.1
>
> Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, 
> HIVE-10711.3.patch, HIVE-10711.4.patch
>
>
> Tez HashTableLoader bases its memory allocation on 
> HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the 
> process max memory then this can result in the HashTableLoader trying to use 
> more memory than available to the process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation

2015-05-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10812:
---
Attachment: HIVE-10812.03.patch

> Scaling PK/FK's selectivity for stats annotation
> 
>
> Key: HIVE-10812
> URL: https://issues.apache.org/jira/browse/HIVE-10812
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10812.01.patch, HIVE-10812.02.patch, 
> HIVE-10812.03.patch
>
>
> Right now, the computation of the selectivity of FK side based on PK side 
> does not take into consideration of the range of FK and the range of PK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation

2015-05-26 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559538#comment-14559538
 ] 

Laljo John Pullokkaran commented on HIVE-10812:
---

+1

> Scaling PK/FK's selectivity for stats annotation
> 
>
> Key: HIVE-10812
> URL: https://issues.apache.org/jira/browse/HIVE-10812
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10812.01.patch, HIVE-10812.02.patch
>
>
> Right now, the computation of the selectivity of FK side based on PK side 
> does not take into consideration of the range of FK and the range of PK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases

2015-05-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559540#comment-14559540
 ] 

Hive QA commented on HIVE-10811:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735331/HIVE-10811.02.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 8973 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_crc32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sha1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join30
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4044/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4044/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4044/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12735331 - PreCommit-HIVE-TRUNK-Build

> RelFieldTrimmer throws NoSuchElementException in some cases
> ---
>
> Key: HIVE-10811
> URL: https://issues.apache.org/jira/browse/HIVE-10811
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, 
> HIVE-10811.patch
>
>
> RelFieldTrimmer runs into NoSuchElementException in some cases.
> Stack trace:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Internal error: While 
> invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
>   at org.apache.calcite.util.Util.newInternal(Util.java:743)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorIm

[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases

2015-05-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559395#comment-14559395
 ] 

Gopal V commented on HIVE-10792:


[~gaodayue]: not sure if the patch can retain PPD for map-joins. {{alias.size 
== 1}} might jump out of PPD cases even the dummy operators are the present.

> PPD leads to wrong answer when mapper scans the same table with multiple 
> aliases
> 
>
> Key: HIVE-10792
> URL: https://issues.apache.org/jira/browse/HIVE-10792
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, Query Processor
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.2.1
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>Priority: Critical
> Fix For: 1.2.1
>
> Attachments: HIVE-10792.1.patch, HIVE-10792.2.patch, 
> HIVE-10792.test.sql
>
>
> Here's the steps to reproduce the bug.
> First of all, prepare a simple ORC table with one row
> {code}
> create table test_orc (c0 int, c1 int) stored as ORC;
> {code}
> Table: test_orc
> ||c0||c1||
> |0|1|
> The following SQL gets empty result which is not expected
> {code}
> select * from test_orc t1
> union all
> select * from test_orc t2
> where t2.c0 = 1
> {code}
> Self join is also broken
> {code}
> set hive.auto.convert.join=false; -- force common join
> select * from test_orc t1
> left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0);
> {code}
> It gets empty result while the expected answer is
> ||t1.c0||t1.c1||t2.c0||t2.c1||
> |0|1|NULL|NULL|
> In these cases, we pushdown predicates into OrcInputFormat. As a result, 
> TableScanOperator for "t1" can't receive its rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6991) History not able to disable/enable after session started

2015-05-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559359#comment-14559359
 ] 

Hive QA commented on HIVE-6991:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735306/HIVE-6991.2.patch

{color:red}ERROR:{color} -1 due to 637 failed/errored test(s), 8973 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table2_h23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table_h23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_protect_mode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition_authorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_serde2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_add_column
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_add_column2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_add_column3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_change_schema
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_comments
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_compression_enabled
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_compression_enabled_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_date
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_deserialize_map_null
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_evolved_schemas
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_nullable_fields
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_sanity_test
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_schema_evolution_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_timestamp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_type_evolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_udfs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_output_format
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketconte

[jira] [Comment Edited] (HIVE-10304) Add deprecation message to HiveCLI

2015-05-26 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559325#comment-14559325
 ] 

Xuefu Zhang edited comment on HIVE-10304 at 5/26/15 4:23 PM:
-

The final decision will be replacing Hive CLI's implementation with beeline 
(HIVE-10511). You still have the script file (hive.sh).

Since you have so many scripts using Hive CLI. When HIVE-10511 is in place, it 
would be great if you can test it with your script. Thanks.


was (Author: xuefuz):
The final decision will be replacing Hive CLI's implementation with beeline 
(HIVE-10511). You still have the script.

Since you have so many scripts using Hive CLI. When HIVE-10511 is in place, it 
would be great if you can test it with your script. Thanks.

> Add deprecation message to HiveCLI
> --
>
> Key: HIVE-10304
> URL: https://issues.apache.org/jira/browse/HIVE-10304
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Affects Versions: 1.1.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>  Labels: TODOC1.2
> Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch
>
>
> As Beeline is now the recommended command line tool to Hive, we should add a 
> message to HiveCLI to indicate that it is deprecated and redirect them to 
> Beeline.  
> This is not suggesting to remove HiveCLI for now, but just a helpful 
> direction for user to know the direction to focus attention in Beeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10304) Add deprecation message to HiveCLI

2015-05-26 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559325#comment-14559325
 ] 

Xuefu Zhang commented on HIVE-10304:


The final decision will be replacing Hive CLI's implementation with beeline 
(HIVE-10511). You still have the script.

Since you have so many scripts using Hive CLI. When HIVE-10511 is in place, it 
would be great if you can test it with your script. Thanks.

> Add deprecation message to HiveCLI
> --
>
> Key: HIVE-10304
> URL: https://issues.apache.org/jira/browse/HIVE-10304
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Affects Versions: 1.1.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>  Labels: TODOC1.2
> Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch
>
>
> As Beeline is now the recommended command line tool to Hive, we should add a 
> message to HiveCLI to indicate that it is deprecated and redirect them to 
> Beeline.  
> This is not suggesting to remove HiveCLI for now, but just a helpful 
> direction for user to know the direction to focus attention in Beeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10277) Unable to process Comment line '--' in HIVE-1.1.0

2015-05-26 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559321#comment-14559321
 ] 

Xuefu Zhang commented on HIVE-10277:


Thank you. I have reverted it.

@Chinna, I'll reopen the JIRA. Could you resubmit a patch if it's still a
problem, and make sure that tests passes.

Thanks,
Xuefu

On Tue, May 26, 2015 at 7:00 AM, Ferdinand Xu (JIRA) 



> Unable to process Comment line '--' in HIVE-1.1.0
> -
>
> Key: HIVE-10277
> URL: https://issues.apache.org/jira/browse/HIVE-10277
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0
>Reporter: Kaveen Raajan
>Assignee: Chinna Rao Lalam
>Priority: Minor
>  Labels: hive
> Fix For: 1.3.0
>
> Attachments: HIVE-10277-1.patch, HIVE-10277.2.patch, HIVE-10277.patch
>
>
> I tried to use comment line (*--*) in HIVE-1.1.0 grunt shell like,
> ~hive>--this is comment line~
> ~hive>show tables;~
> I got error like 
> {quote}
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:
> 1020)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:19
> 9)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:16
> 6)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:2
> 07)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754
> )
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
> java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> FAILED: ParseException line 2:0 cannot recognize input near '' '' 
> ' F>'
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10304) Add deprecation message to HiveCLI

2015-05-26 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559313#comment-14559313
 ] 

Hari Sekhon commented on HIVE-10304:


If just recommending to users to use Beeline instead of Hive CLI that is fine 
but if Hive 1 CLI was every removed that would cause major headaches to users 
such as myself who have lots of scripts and programs that make calls to Hive 
CLI and rewriting things that already work fine for years is not cool. In fact 
it's the opposite of cool.

> Add deprecation message to HiveCLI
> --
>
> Key: HIVE-10304
> URL: https://issues.apache.org/jira/browse/HIVE-10304
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI
>Affects Versions: 1.1.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>  Labels: TODOC1.2
> Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch
>
>
> As Beeline is now the recommended command line tool to Hive, we should add a 
> message to HiveCLI to indicate that it is deprecated and redirect them to 
> Beeline.  
> This is not suggesting to remove HiveCLI for now, but just a helpful 
> direction for user to know the direction to focus attention in Beeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-10277) Unable to process Comment line '--' in HIVE-1.1.0

2015-05-26 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reopened HIVE-10277:


Patch is reverted because of test failures. Please resubmit patch if problem 
remains. 

> Unable to process Comment line '--' in HIVE-1.1.0
> -
>
> Key: HIVE-10277
> URL: https://issues.apache.org/jira/browse/HIVE-10277
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0
>Reporter: Kaveen Raajan
>Assignee: Chinna Rao Lalam
>Priority: Minor
>  Labels: hive
> Fix For: 1.3.0
>
> Attachments: HIVE-10277-1.patch, HIVE-10277.2.patch, HIVE-10277.patch
>
>
> I tried to use comment line (*--*) in HIVE-1.1.0 grunt shell like,
> ~hive>--this is comment line~
> ~hive>show tables;~
> I got error like 
> {quote}
> NoViableAltException(-1@[])
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:
> 1020)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:19
> 9)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:16
> 6)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:2
> 07)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754
> )
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
> java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> FAILED: ParseException line 2:0 cannot recognize input near '' '' 
> ' F>'
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly

2015-05-26 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-10815:
-
Attachment: (was: HIVE-10815.patch)

> Let HiveMetaStoreClient Choose MetaStore Randomly
> -
>
> Key: HIVE-10815
> URL: https://issues.apache.org/jira/browse/HIVE-10815
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.2.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-10815.patch
>
>
> Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs 
> when multiple metastores configured.
>  Choosing MetaStore Randomly will be good for load balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly

2015-05-26 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-10815:
-
Attachment: HIVE-10815.patch

> Let HiveMetaStoreClient Choose MetaStore Randomly
> -
>
> Key: HIVE-10815
> URL: https://issues.apache.org/jira/browse/HIVE-10815
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.2.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-10815.patch, HIVE-10815.patch
>
>
> Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs 
> when multiple metastores configured.
>  Choosing MetaStore Randomly will be good for load balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10802) Table join query with some constant field in select fails

2015-05-26 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu reassigned HIVE-10802:
---

Assignee: Aihua Xu

> Table join query with some constant field in select fails
> -
>
> Key: HIVE-10802
> URL: https://issues.apache.org/jira/browse/HIVE-10802
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> The following query fails:
> {noformat}
> create table tb1 (year string, month string);
> create table tb2(month string);
> select unix_timestamp(a.year) 
> from (select * from tb1 where year='2001') a join tb2 b on (a.month=b.month);
> {noformat}
> with the exception {noformat}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.(StandardStructObjectInspector.java:109)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175)
> {noformat}
> The issue seems to be: during the query compilation, the field in the select 
> should be replaced with the constant when some UDFs are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases

2015-05-26 Thread Dayue Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559231#comment-14559231
 ] 

Dayue Gao commented on HIVE-10792:
--

I don't think the failed tests are related to this patch.

[~gopalv] [~thejas] [~sershe] Could you have a look at this? Should it be 
backported to old releases?

> PPD leads to wrong answer when mapper scans the same table with multiple 
> aliases
> 
>
> Key: HIVE-10792
> URL: https://issues.apache.org/jira/browse/HIVE-10792
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, Query Processor
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.2.1
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>Priority: Critical
> Fix For: 1.2.1
>
> Attachments: HIVE-10792.1.patch, HIVE-10792.2.patch, 
> HIVE-10792.test.sql
>
>
> Here's the steps to reproduce the bug.
> First of all, prepare a simple ORC table with one row
> {code}
> create table test_orc (c0 int, c1 int) stored as ORC;
> {code}
> Table: test_orc
> ||c0||c1||
> |0|1|
> The following SQL gets empty result which is not expected
> {code}
> select * from test_orc t1
> union all
> select * from test_orc t2
> where t2.c0 = 1
> {code}
> Self join is also broken
> {code}
> set hive.auto.convert.join=false; -- force common join
> select * from test_orc t1
> left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0);
> {code}
> It gets empty result while the expected answer is
> ||t1.c0||t1.c1||t2.c0||t2.c1||
> |0|1|NULL|NULL|
> In these cases, we pushdown predicates into OrcInputFormat. As a result, 
> TableScanOperator for "t1" can't receive its rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO

2015-05-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559220#comment-14559220
 ] 

Hive QA commented on HIVE-9069:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12735298/HIVE-9069.14.patch

{color:red}ERROR:{color} -1 due to 636 failed/errored test(s), 8974 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table2_h23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table_h23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_protect_mode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition_authorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_serde2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_add_column
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_add_column2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_add_column3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_change_schema
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_comments
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_compression_enabled
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_compression_enabled_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_date
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_deserialize_map_null
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_evolved_schemas
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_nullable_fields
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_sanity_test
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_schema_evolution_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_timestamp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_type_evolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_udfs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_output_format
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcont

[jira] [Updated] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases

2015-05-26 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10811:
---
Attachment: HIVE-10811.02.patch

> RelFieldTrimmer throws NoSuchElementException in some cases
> ---
>
> Key: HIVE-10811
> URL: https://issues.apache.org/jira/browse/HIVE-10811
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, 
> HIVE-10811.patch
>
>
> RelFieldTrimmer runs into NoSuchElementException in some cases.
> Stack trace:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Internal error: While 
> invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
>   at org.apache.calcite.util.Util.newInternal(Util.java:743)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269)
>   at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536)
>   ... 32 more
> Caused by: java.lang.AssertionError: Internal error: While invoking method 
> 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
>   at org.apache.calcite.util.Util.newInternal(Util.java:743)
>   at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543)
>   at 
> org.apache.calcite.sql2rel.RelFiel

[jira] [Updated] (HIVE-8458) Potential null dereference in Utilities#clearWork()

2015-05-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8458:
-
Description: 
{code}
Path mapPath = getPlanPath(conf, MAP_PLAN_NAME);
Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME);

// if the plan path hasn't been initialized just return, nothing to clean.
if (mapPath == null && reducePath == null) {
  return;
}

try {
  FileSystem fs = mapPath.getFileSystem(conf);
{code}

If mapPath is null but reducePath is not null, getFileSystem() call would 
produce NPE

  was:
{code}
Path mapPath = getPlanPath(conf, MAP_PLAN_NAME);
Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME);

// if the plan path hasn't been initialized just return, nothing to clean.
if (mapPath == null && reducePath == null) {
  return;
}

try {
  FileSystem fs = mapPath.getFileSystem(conf);
{code}
If mapPath is null but reducePath is not null, getFileSystem() call would 
produce NPE


> Potential null dereference in Utilities#clearWork()
> ---
>
> Key: HIVE-8458
> URL: https://issues.apache.org/jira/browse/HIVE-8458
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Ted Yu
>Assignee: skrho
>Priority: Minor
> Attachments: HIVE-8458_001.patch
>
>
> {code}
> Path mapPath = getPlanPath(conf, MAP_PLAN_NAME);
> Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME);
> // if the plan path hasn't been initialized just return, nothing to clean.
> if (mapPath == null && reducePath == null) {
>   return;
> }
> try {
>   FileSystem fs = mapPath.getFileSystem(conf);
> {code}
> If mapPath is null but reducePath is not null, getFileSystem() call would 
> produce NPE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-9605) Remove parquet nested objects from wrapper writable objects

2015-05-26 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reopened HIVE-9605:


Sorry [~spena], seems master has lots of failed cases. I will commit after it 
comes back to normal.

> Remove parquet nested objects from wrapper writable objects
> ---
>
> Key: HIVE-9605
> URL: https://issues.apache.org/jira/browse/HIVE-9605
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 0.14.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: parquet-branch
>
> Attachments: HIVE-9605.3.patch, HIVE-9605.4.patch, HIVE-9605.5.patch, 
> HIVE-9605.6.patch
>
>
> Parquet nested types are using an extra wrapper object (ArrayWritable) as a 
> wrapper of map and list elements. This extra object is not needed and causing 
> unnecessary memory allocations.
> An example of code is on HiveCollectionConverter.java:
> {noformat}
> public void end() {
> parent.set(index, wrapList(new ArrayWritable(
> Writable.class, list.toArray(new Writable[list.size()];
> }
> {noformat}
> This object is later unwrapped on AbstractParquetMapInspector, i.e.:
> {noformat}
> final Writable[] mapContainer = ((ArrayWritable) data).get();
> final Writable[] mapArray = ((ArrayWritable) mapContainer[0]).get();
> for (final Writable obj : mapArray) {
>   ...
> }
> {noformat}
> We should get rid of this wrapper object to save time and memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >