[jira] [Created] (HIVE-23041) LLAP purge command can lead to resource leak

2020-03-17 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-23041:
-

 Summary: LLAP purge command can lead to resource leak
 Key: HIVE-23041
 URL: https://issues.apache.org/jira/browse/HIVE-23041
 Project: Hive
  Issue Type: Bug
Reporter: Slim Bouguerra


As per the Java Spec 
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html

An unused ExecutorService should be shut down to allow reclamation of its 
resources.

Code like this is a serious resource leak in case  user fires multiple commands.

https://github.com/apache/hive/blob/7ae6756d40468d18b65423a0b5174b827dc42b60/ql/src/java/org/apache/hadoop/hive/ql/processors/LlapCacheResourceProcessor.java#L132

The other question that this raise is how those tasks responds to interrupt or 
cancel on the thread level [~prasanth_j] any idea if one task hangs on IO what 
happens ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22934) Hive server interactive log counters to error stream

2020-02-26 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22934:
-

 Summary: Hive server interactive log counters to error stream
 Key: HIVE-22934
 URL: https://issues.apache.org/jira/browse/HIVE-22934
 Project: Hive
  Issue Type: Bug
Reporter: Slim Bouguerra


Hive server is logging the console output to system error stream.
This need to be fixed because 
First we do not roll the file.
Second writing to such file is done sequential and can lead to throttle/poor 
perf.
{code}
-rw-r--r--  1 hive hadoop 9.5G Feb 26 17:22 hive-server2-interactive.err
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22760) Add Clock caching eviction based strategy

2020-01-22 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22760:
-

 Summary: Add Clock caching eviction based strategy
 Key: HIVE-22760
 URL: https://issues.apache.org/jira/browse/HIVE-22760
 Project: Hive
  Issue Type: New Feature
  Components: llap
Reporter: Slim Bouguerra
Assignee: Slim Bouguerra


LRFU is the current default right now.
The main issue with such Strategy is that it has a very high memory overhead, 
in addition to that, most of the accounting has to happen under locks thus can 
be source of contentions.
Add Simpler policy like clock, can help with both issues.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22755) Cleaner/Compaction can skip the read locks and use the min open txn id

2020-01-21 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22755:
-

 Summary: Cleaner/Compaction can skip the read locks and use the 
min open txn id
 Key: HIVE-22755
 URL: https://issues.apache.org/jira/browse/HIVE-22755
 Project: Hive
  Issue Type: Sub-task
Reporter: Slim Bouguerra






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22754) Trim some extra HDFS find file name calls that can be deduced using current TX watermark

2020-01-21 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22754:
-

 Summary: Trim some extra HDFS find file name calls that can be 
deduced using current TX watermark
 Key: HIVE-22754
 URL: https://issues.apache.org/jira/browse/HIVE-22754
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Reporter: Slim Bouguerra
Assignee: Slim Bouguerra






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22743) Enable Fast LLAP-IO path for tables with schema evolution case appending columns.

2020-01-17 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22743:
-

 Summary: Enable Fast LLAP-IO path for tables with schema evolution 
case appending columns.
 Key: HIVE-22743
 URL: https://issues.apache.org/jira/browse/HIVE-22743
 Project: Hive
  Issue Type: Improvement
Reporter: Slim Bouguerra
Assignee: Slim Bouguerra






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22742) Skip Class loader forking, when needed.

2020-01-17 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22742:
-

 Summary: Skip Class loader forking, when needed.
 Key: HIVE-22742
 URL: https://issues.apache.org/jira/browse/HIVE-22742
 Project: Hive
  Issue Type: Improvement
Reporter: Slim Bouguerra






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22629) AST Node Children can be quite expensive to build due to List resizing

2019-12-11 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22629:
-

 Summary: AST Node Children can be quite expensive to build due to 
List resizing
 Key: HIVE-22629
 URL: https://issues.apache.org/jira/browse/HIVE-22629
 Project: Hive
  Issue Type: Improvement
Reporter: Slim Bouguerra
Assignee: Slim Bouguerra


As per the attached profile, The AST Node can be a major source of CPU and 
memory churn, due to the ArrayList resizing and copy.
In my Opinion this can be amortized by providing the actual size.
[~jcamachorodriguez] / [~vgarg] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22492:
-

 Summary: Amortize lock contention due to LRFU accounting
 Key: HIVE-22492
 URL: https://issues.apache.org/jira/browse/HIVE-22492
 Project: Hive
  Issue Type: Improvement
Reporter: Slim Bouguerra
Assignee: Slim Bouguerra


LRFU eviction policy can be a major source of contention under high load.
This can be see on the following profiles.
To fix this the idea is to use a batching wrapper to amortize the locking 
contention.
The trick i a common way to amortize locking as explained here 
http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-08 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22476:
-

 Summary: Hive datediff function provided inconsistent results when 
hive.fetch.task.conversion is set to none
 Key: HIVE-22476
 URL: https://issues.apache.org/jira/browse/HIVE-22476
 Project: Hive
  Issue Type: Bug
Reporter: Slim Bouguerra
Assignee: Slim Bouguerra


The actual issue stems to the different date parser used by various part of the 
engine.
Fetch task uses udfdatediff via {code} 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.

This fix is meant to be not very intrusive and will add more support to the 
GenericUDFToDate by enhancing the parser.
For the longer term will be better to use one parser for all the operators.

Thanks [~Rajkumar Singh] for the repro example
{code} 
create external table testdatediff(datetimecol string) stored as orc;
insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
select datetimecol from testdatediff where datediff(cast(current_timestamp as 
string), datetimecol)<183;

set hive.ferch.task.conversion=none;
select datetimecol from testdatediff where datediff(cast(current_timestamp as 
string), datetimecol)<183;
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22446) Make IO decoding quantiles counters less contended resource.

2019-11-01 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22446:
-

 Summary: Make IO decoding quantiles counters less contended 
resource.
 Key: HIVE-22446
 URL: https://issues.apache.org/jira/browse/HIVE-22446
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Slim Bouguerra
Assignee: Slim Bouguerra
 Fix For: 4.0.0


Currently LLAP IO relies on Hadoop Lock-based quantiles data structure and 
updates the IO decoding sample on a per batch based using.
{code} 
org.apache.hadoop.hive.llap.metrics.LlapDaemonIOMetrics#addDecodeBatchTime
{code}
via 
{code} 
org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer#consumeData
{code}
This can be a source of thread contention.
Goal of this ticket is to reduce the frequency of updates to avoid major 
bottleneck.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22437) LLAP Metadata cache NPE on locking metadata.

2019-10-30 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22437:
-

 Summary: LLAP Metadata cache NPE on locking metadata.
 Key: HIVE-22437
 URL: https://issues.apache.org/jira/browse/HIVE-22437
 Project: Hive
  Issue Type: Bug
Reporter: Slim Bouguerra
Assignee: Slim Bouguerra


{code}
java.lang.NullPointerException
at 
org.apache.hadoop.hive.llap.io.metadata.MetadataCache.unlockSingleBuffer(MetadataCache.java:464)
at 
org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockBuffer(MetadataCache.java:409)
at 
org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockOldVal(MetadataCache.java:314)
at 
org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putInternal(MetadataCache.java:287)
at 
org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:199)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22436) Add more logging to the test.

2019-10-30 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22436:
-

 Summary: Add more logging to the test.
 Key: HIVE-22436
 URL: https://issues.apache.org/jira/browse/HIVE-22436
 Project: Hive
  Issue Type: Sub-task
Reporter: Slim Bouguerra
Assignee: Slim Bouguerra






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22398) Remove Yarn queue management via ShimLoader.

2019-10-23 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-22398:
-

 Summary: Remove Yarn queue management via ShimLoader.
 Key: HIVE-22398
 URL: https://issues.apache.org/jira/browse/HIVE-22398
 Project: Hive
  Issue Type: Task
Reporter: Slim Bouguerra
Assignee: Slim Bouguerra


Legacy MR Hive used this shim loader to do fair scheduling using Yarn Queues 
non public APIs.
This patch will remove this code since it is not used anymore and new 
[Yarn-YARN-8967|https://issues.apache.org/jira/browse/YARN-8967] changes will 
break future version upgrade 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22168) remove excessive logging by llap cache.

2019-09-04 Thread slim bouguerra (Jira)
slim bouguerra created HIVE-22168:
-

 Summary: remove excessive logging by llap cache.
 Key: HIVE-22168
 URL: https://issues.apache.org/jira/browse/HIVE-22168
 Project: Hive
  Issue Type: Improvement
  Components: llap, Logging
Reporter: slim bouguerra
Assignee: slim bouguerra


Lllap cache logging is very expensive when it comes to log every request 
buffers range.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HIVE-22127) Query Routing logging appender is leaking resources of RandomAccessFileManager.

2019-08-20 Thread slim bouguerra (Jira)
slim bouguerra created HIVE-22127:
-

 Summary: Query Routing logging appender is leaking resources of 
RandomAccessFileManager.
 Key: HIVE-22127
 URL: https://issues.apache.org/jira/browse/HIVE-22127
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Query routing appender registered by
{code:java}
org.apache.hadoop.hive.ql.log.LogDivertAppender#registerRoutingAppender
{code}

Is leaking reference to the 
{code} org.apache.hadoop.hive.ql.log.HushableRandomAccessFileAppender {code}

On closing operation hooks 

{code}
org.apache.hive.service.cli.operation.Operation#cleanupOperationLog
{code}

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HIVE-22125) Move to Kafka 2.3 Clients

2019-08-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-22125:
-

 Summary: Move to Kafka 2.3 Clients
 Key: HIVE-22125
 URL: https://issues.apache.org/jira/browse/HIVE-22125
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: slim bouguerra






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-22115) Prevent the creation of query-router logger in HS2 as per property

2019-08-14 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-22115:
-

 Summary: Prevent the creation of query-router logger in HS2 as per 
property
 Key: HIVE-22115
 URL: https://issues.apache.org/jira/browse/HIVE-22115
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: slim bouguerra


Avoid the creation and registration of query-router logger if the Hive server 
Property is set to false by the user

{code}

HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED

{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-21989) Query must fail in case we have no explicit cast for cases like reader schema is int32 and file schema is int64.

2019-07-11 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-21989:
-

 Summary: Query must fail in case we have no explicit cast for 
cases like reader schema is int32 and file schema is int64.
 Key: HIVE-21989
 URL: https://issues.apache.org/jira/browse/HIVE-21989
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


In Some cases the table definition can be different from the orc file schema.

Take an example, Orc files has a column of int64  (bigint) while the table 
schema definition is int32 (int) then the query must fail in absence of 
explicit cast from the user.

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-21934) Materialized view on top of Druid not pushing every thing

2019-06-28 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-21934:
-

 Summary: Materialized view on top of Druid not pushing every thing
 Key: HIVE-21934
 URL: https://issues.apache.org/jira/browse/HIVE-21934
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: Jesus Camacho Rodriguez


The title is not very informative, but examples hopefully are.

this is the plan with the view

{code}


explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`,
CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
SUM(1) AS `sum_number_of_records_ok`,
YEAR(`dates_n1`.`__time`) AS `yr___time_ok`
FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0`
JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON (`lineorder_n0`.`lo_orderdate` 
= `dates_n1`.`d_datekey`)
JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON 
(`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`)
JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON 
(`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`)
JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON 
(`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`)
GROUP BY MONTH(`dates_n1`.`__time`),
CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT),
YEAR(`dates_n1`.`__time`)
INFO : Starting task [Stage-3:EXPLAIN] in serial mode
INFO : Completed executing 
command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977);
 Time taken: 0.005 seconds
INFO : OK
++
| Explain |
++
| Plan optimized by CBO. |
| |
| Vertex dependency in root stage |
| Reducer 2 <- Map 1 (SIMPLE_EDGE) |
| |
| Stage-0 |
| Fetch Operator |
| limit:-1 |
| Stage-1 |
| Reducer 2 vectorized, llap |
| File Output Operator [FS_13] |
| Select Operator [SEL_12] (rows=300018951 width=38) |
| Output:["_col0","_col1","_col2","_col3"] |
| Group By Operator [GBY_11] (rows=300018951 width=38) |
| 
Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0,
 KEY._col1, KEY._col2 |
| <-Map 1 [SIMPLE_EDGE] vectorized, llap |
| SHUFFLE [RS_10] |
| PartitionCols:_col0, _col1, _col2 |
| Group By Operator [GBY_9] (rows=600037902 width=38) |
| Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, 
_col1, _col2 |
| Select Operator [SEL_8] (rows=600037902 width=38) |
| Output:["_col0","_col1","_col2"] |
| TableScan [TS_0] (rows=600037902 width=38) |
| 
mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"}
 |
| |
++

 

{code}

if i use a simple druid table without MV 

{code}


explain SELECT MONTH(`__time`) AS `mn___time_ok`,
CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
SUM(1) AS `sum_number_of_records_ok`,
YEAR(`__time`) AS `yr___time_ok`
FROM `druid_ssb.ssb_druid_100`
GROUP BY MONTH(`__time`),
CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT),
YEAR(`__time`);

{code}

{code}

++
| Explain |
++
| Plan optimized by CBO. |
| |
| Stage-0 |
| Fetch Operator |
| limit:-1 |
| Select Operator [SEL_1] |
| Output:["_col0","_col1","_col2","_col3"] |
| TableScan [TS_0] |
| 
Output:["extract_month","vc","$f3","extract_year"],properties:\{"druid.fieldNames":"extract_month,vc,extract_year,$f3","druid.fieldTypes":"int,bigint,int,bigint","druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_ssb.ssb_druid_100\",\"granularity\":\"all\",\"dimensions\":[{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_month\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"M\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}},\{\"type\":\"default\",\"dimension\":\"vc\",\"outputName\":\"vc\",\"outputType\":\"LONG\"},\{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_year\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}}],\"virtualColumns\":[\{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"CAST(((CAST((timestamp_extract(\\\"__time\\\",'MONTH','America/New_York')
 - 1), 'DOUBLE') / CAST(3, 'DOUBLE')) + CAST(1, 'DOUBLE')), 

[jira] [Created] (HIVE-21689) Buddy Allocator memory accounting does not account for failed allocation attempts

2019-05-03 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-21689:
-

 Summary: Buddy Allocator memory accounting does not account for 
failed allocation attempts
 Key: HIVE-21689
 URL: https://issues.apache.org/jira/browse/HIVE-21689
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: slim bouguerra
Assignee: slim bouguerra


Allocation method on Buddy Allocator, does not release the reserved memory in 
case we failed to allocate the full sequence.
Simple example:
Assume We have an allocation request of 1kb.
Will call reserver and reserve 1KB.
Try to allocate will fail due to race condition.
Try to discard will fail due to no space.
At this point will exit without releasing the reserved memory.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21686) Brute Force eviction can lead to a random uncontrolled eviction pattern.

2019-05-02 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-21686:
-

 Summary: Brute Force eviction can lead to a random uncontrolled 
eviction pattern.
 Key: HIVE-21686
 URL: https://issues.apache.org/jira/browse/HIVE-21686
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Current logic used by brute force eviction can lead to a perpetual random 
eviction pattern.
For instance if the cache build a small pocket of free memory where the total 
size is greater than incoming allocation request, the allocator will randomly 
evict block that fits a particular size.
This can happen over and over therefore all the eviction will be random.
In Addition this random eviction will lead a leak in the linked list maintained 
by the policy since it does not know anymore about what is evicted and what not.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21665) Unable to reconstruct valid SQL query from AST when back ticks are used

2019-04-29 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-21665:
-

 Summary: Unable to reconstruct valid SQL query from AST when back 
ticks are used
 Key: HIVE-21665
 URL: https://issues.apache.org/jira/browse/HIVE-21665
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


Hive-6013 have introduced a parser rule that removes all the
{code:java}
`{code}
from identifiers or query alias, this can result in some issue when we need to 
reconstruct the actual SQL query from the AST.

To reproduce the bug you can use explain analyze  statement as the following 
query
{code:java}
explain analyze select 'literal' as `alias with space`;

{code}
This bugs will affect how Ranger plugin and probably result cache, where in 
both places we need to reconstruct the query from AST.

 The current work around is to avoid white spaces within Aliases. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21621) Update Kafka Clients to recent release 2.2.0

2019-04-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-21621:
-

 Summary: Update Kafka Clients to recent release 2.2.0
 Key: HIVE-21621
 URL: https://issues.apache.org/jira/browse/HIVE-21621
 Project: Hive
  Issue Type: Task
  Components: kafka integration
Reporter: slim bouguerra
Assignee: slim bouguerra


all in the title update Kafka Storage Handler to the most recent clients 
library.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21334) Eviction of blocks is major source of blockage for allocation request. Allocation path need to be lock-free.

2019-02-27 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-21334:
-

 Summary: Eviction of blocks is major source of blockage for 
allocation request. Allocation path need to be lock-free.
 Key: HIVE-21334
 URL: https://issues.apache.org/jira/browse/HIVE-21334
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: slim bouguerra
 Attachments: lock_profile.png

Eviction is getting in the way of memory allocation when the query fragment has 
no cache entry.
This is cause major bottleneck and waist lot of cpu cycles.
To fix this is first we can batch the evictions to avoid taking the lock 
multiple times.
The memory manager need to be able to anticipate such issue and keep some spare 
space for queries that do not have any hit.

{code}
IO-Elevator-Thread-12  Blocked CPU usage on sample: 692ms
  
org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.evictSomeBlocks(long) 
LowLevelLrfuCachePolicy.java:264
  org.apache.hadoop.hive.llap.cache.CacheContentsTracker.evictSomeBlocks(long) 
CacheContentsTracker.java:194
  
org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(long,
 boolean, AtomicBoolean) LowLevelCacheMemoryManager.java:87
  
org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(long,
 AtomicBoolean) LowLevelCacheMemoryManager.java:63
  
org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(MemoryBuffer[],
 int, Allocator$BufferObjectFactory, AtomicBoolean) BuddyAllocator.java:263
  
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.allocateMultiple(MemoryBuffer[],
 int) EncodedReaderImpl.java:1295
  
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(long,
 DiskRangeList, long, long, EncodedColumnBatch$ColumnStreamData, long, long, 
IdentityHashMap) EncodedReaderImpl.java:923
  
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(int,
 StripeInformation, OrcProto$RowIndex[], List, List, boolean[], boolean[], 
Consumer) EncodedReaderImpl.java:501
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead() 
OrcEncodedDataReader.java:407
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run() 
OrcEncodedDataReader.java:266
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run() 
OrcEncodedDataReader.java:263
  java.security.AccessController.doPrivileged(PrivilegedExceptionAction, 
AccessControlContext) AccessController.java (native)
  javax.security.auth.Subject.doAs(Subject, PrivilegedExceptionAction) 
Subject.java:422
  
org.apache.hadoop.security.UserGroupInformation.doAs(PrivilegedExceptionAction) 
UserGroupInformation.java:1688
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal() 
OrcEncodedDataReader.java:263
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal() 
OrcEncodedDataReader.java:110
  org.apache.tez.common.CallableWithNdc.call() CallableWithNdc.java:36
  
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call()
 StatsRecordingThreadPool.java:110
  java.util.concurrent.FutureTask.run() FutureTask.java:266
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1142
  java.util.concurrent.ThreadPoolExecutor$Worker.run() 
ThreadPoolExecutor.java:617
  java.lang.Thread.run() Thread.java:745 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21332) Cache Purge command does purge the in-use buffer.

2019-02-27 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-21332:
-

 Summary: Cache Purge command does purge the in-use buffer.
 Key: HIVE-21332
 URL: https://issues.apache.org/jira/browse/HIVE-21332
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Cache Purge command, is purging what is not suppose to evict.
This can lead to unrecoverable state.
{code} 
TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : 
attempt_1545278897356_0093_27_00_01_3:java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
java.io.IOException: 
org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
Failed to allocate 32768; at 0 out of 1 (entire cache is fragmented and locked, 
or an internal issue)
 at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
 at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.io.IOException: 
org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
Failed to allocate 32768; at 0 out of 1 (entire cache is fragmented and locked, 
or an internal issue)
 at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80)
 at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
 at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
 ... 15 more
Caused by: java.io.IOException: java.io.IOException: 
org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
Failed to allocate 32768; at 0 out of 1 (entire cache is fragmented and locked, 
or an internal issue)
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
 at 
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
 at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
 at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
 at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
 ... 17 more
Caused by: java.io.IOException: 
org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
Failed to allocate 32768; at 0 out of 1 (entire cache is fragmented and locked, 
or an internal issue)
 at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:513)
 at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:407)
 at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:266)
 at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:263)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
 at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:263)
 at 

[jira] [Created] (HIVE-21026) Druid Vectorize Reader is not using the correct input size

2018-12-10 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-21026:
-

 Summary: Druid Vectorize Reader is not using the correct input size
 Key: HIVE-21026
 URL: https://issues.apache.org/jira/browse/HIVE-21026
 Project: Hive
  Issue Type: Bug
  Components: Druid integration, Vectorization
Reporter: slim bouguerra
Assignee: slim bouguerra


In case the number of projected columns is different from number of Input row 
column will get an array out of bound exception.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21004) Less object creation for Hive Kafka reader

2018-12-04 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-21004:
-

 Summary: Less object creation for Hive Kafka reader
 Key: HIVE-21004
 URL: https://issues.apache.org/jira/browse/HIVE-21004
 Project: Hive
  Issue Type: Improvement
  Components: kafka integration
Reporter: slim bouguerra
Assignee: slim bouguerra


Reduce the amount of un-needed object allocation by using a row boat as way to 
carry data around.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20997) Make Druid Cluster start on random ports.

2018-12-03 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20997:
-

 Summary: Make Druid Cluster start on random ports.
 Key: HIVE-20997
 URL: https://issues.apache.org/jira/browse/HIVE-20997
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
Assignee: slim bouguerra


As of now Druid Tests will run in a Single batch.

To avoid timeouts we need to support batching of tests.

As suggested by [~vihangk1] it will be better to start the Druid tests setups 
on a totally random ports.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20995) Add mini Druid to the list of tests

2018-12-03 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20995:
-

 Summary: Add mini Druid to the list of tests
 Key: HIVE-20995
 URL: https://issues.apache.org/jira/browse/HIVE-20995
 Project: Hive
  Issue Type: Test
Reporter: slim bouguerra
Assignee: slim bouguerra






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20987) Split Druid Tests to avoid Timeouts

2018-11-29 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20987:
-

 Summary: Split Druid Tests to avoid Timeouts
 Key: HIVE-20987
 URL: https://issues.apache.org/jira/browse/HIVE-20987
 Project: Hive
  Issue Type: Test
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently Druid Tests fail with Timeout issue.

I am plaining on splitting the test into 2 batches at least to avoid timeouts.

I will tweak the test code to pick random Druid nodes ports like that minimize 
the collision issue that we saw before.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20982) Avoid the un-needed object creation within hotloop

2018-11-28 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20982:
-

 Summary: Avoid the un-needed object creation within hotloop
 Key: HIVE-20982
 URL: https://issues.apache.org/jira/browse/HIVE-20982
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
Assignee: slim bouguerra






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20958) Cleaning of imports at Hive-common

2018-11-21 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20958:
-

 Summary: Cleaning of imports at Hive-common
 Key: HIVE-20958
 URL: https://issues.apache.org/jira/browse/HIVE-20958
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: slim bouguerra






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20955) Calcite Rule HiveExpandDistinctAggregatesRule seems throwing IndexOutOfBoundsException

2018-11-21 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20955:
-

 Summary: Calcite Rule HiveExpandDistinctAggregatesRule seems 
throwing IndexOutOfBoundsException
 Key: HIVE-20955
 URL: https://issues.apache.org/jira/browse/HIVE-20955
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: slim bouguerra


 

Adde the following query to Druid test  
ql/src/test/queries/clientpositive/druidmini_expressions.q

{code}
select count(distinct `__time`, cint) from (select * from 
druid_table_alltypesorc) as src;

{code}

leads to error \{code} 2018-11-21T07:36:39,449 ERROR [main] QTestUtil: Client 
execution failed with error code = 4 running "\{code}

with exception stack 

{code}

2018-11-21T07:36:39,443 ERROR [ecd48683-0286-4cb4-b0ad-e150fab51038 main] 
parse.CalcitePlanner: CBO failed, skipping CBO.
java.lang.IndexOutOfBoundsException: index (1) must be less than size (1)
 at 
com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:310) 
~[guava-19.0.jar:?]
 at 
com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:293) 
~[guava-19.0.jar:?]
 at 
com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:41)
 ~[guava-19.0.jar:?]
 at 
org.apache.calcite.rel.metadata.RelMdColumnOrigins.getColumnOrigins(RelMdColumnOrigins.java:77)
 ~[calcite-core-1.17.0.jar:1.17.0]
 at GeneratedMetadataHandler_ColumnOrigin.getColumnOrigins_$(Unknown Source) 
~[?:?]
 at GeneratedMetadataHandler_ColumnOrigin.getColumnOrigins(Unknown Source) 
~[?:?]
 at 
org.apache.calcite.rel.metadata.RelMetadataQuery.getColumnOrigins(RelMetadataQuery.java:345)
 ~[calcite-core-1.17.0.jar:1.17.0]
 at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule.onMatch(HiveExpandDistinctAggregatesRule.java:168)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:315)
 ~[calcite-core-1.17.0.jar:1.17.0]
 at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556) 
~[calcite-core-1.17.0.jar:1.17.0]
 at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415) 
~[calcite-core-1.17.0.jar:1.17.0]
 at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280) 
~[calcite-core-1.17.0.jar:1.17.0]
 at 
org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
 ~[calcite-core-1.17.0.jar:1.17.0]
 at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211) 
~[calcite-core-1.17.0.jar:1.17.0]
 at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198) 
~[calcite-core-1.17.0.jar:1.17.0]
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2363)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2314)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2031)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1780)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1680)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118) 
~[calcite-core-1.17.0.jar:1.17.0]
 at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1043)
 ~[calcite-core-1.17.0.jar:1.17.0]
 at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154) 
~[calcite-core-1.17.0.jar:1.17.0]
 at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111) 
~[calcite-core-1.17.0.jar:1.17.0]
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1439)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:478)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12296)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:358)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:670) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1893) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1840) 

[jira] [Created] (HIVE-20952) Cleaning VectorizationContext.java

2018-11-20 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20952:
-

 Summary: Cleaning VectorizationContext.java
 Key: HIVE-20952
 URL: https://issues.apache.org/jira/browse/HIVE-20952
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: slim bouguerra






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20931) Minor code cleaning for Druid Storage Handler

2018-11-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20931:
-

 Summary: Minor code cleaning for Druid Storage Handler
 Key: HIVE-20931
 URL: https://issues.apache.org/jira/browse/HIVE-20931
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Attachments: HIVE-20931.patch





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20932) Vectorize Druid Storage Handler Reader

2018-11-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20932:
-

 Summary: Vectorize Druid Storage Handler Reader
 Key: HIVE-20932
 URL: https://issues.apache.org/jira/browse/HIVE-20932
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: slim bouguerra


This patch aims at adding support for vectorize read of data from Druid to Hive.

[~t3rmin4t0r] suggested that this will improve the performance of the top level 
operators that supports vectorization.

As a first cut am just adding a wrapper around the existing Record Reader to 
read up to 1024 row at a time. 

Future work will be to avoid going via old reader and convert straight the Json 
(smile format) to Vector primitive types. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20903) Cleanup code inspection issue on the druid adapter.

2018-11-09 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20903:
-

 Summary: Cleanup code inspection issue on the druid adapter.
 Key: HIVE-20903
 URL: https://issues.apache.org/jira/browse/HIVE-20903
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: slim bouguerra


This is a simple cleanup of the code and minor refactor.

I did not change any of the behavior.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20902) Math.abs(rand.nextInt())) or Math.abs(rand.nexLong())) can return a negative number

2018-11-09 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20902:
-

 Summary: Math.abs(rand.nextInt())) or Math.abs(rand.nexLong()))  
can return a negative number
 Key: HIVE-20902
 URL: https://issues.apache.org/jira/browse/HIVE-20902
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


i see a lot of Math.abs(rand.nextInt()))  in the code base and this can return 
a negative number.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20892) Benchmark XXhash for 64 bit hashing function instead of Murmum hash

2018-11-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20892:
-

 Summary: Benchmark XXhash for 64 bit hashing function instead of 
Murmum hash
 Key: HIVE-20892
 URL: https://issues.apache.org/jira/browse/HIVE-20892
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
Assignee: slim bouguerra


https://cyan4973.github.io/xxHash/
FYI this is used by lot of other MPP systems ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20893) BloomK Filter probing method is not thread safe

2018-11-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20893:
-

 Summary: BloomK Filter probing method is not thread safe
 Key: HIVE-20893
 URL: https://issues.apache.org/jira/browse/HIVE-20893
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Reporter: slim bouguerra


As far i can tell this is not an issue for Hive yet (most of the usage of 
probing seems to be done by one thread at a time) but it is an issue of other 
users like Druid as per the following 
issue.[https://github.com/apache/incubator-druid/issues/6546]

The fix is proposed by the author of 
[https://github.com/apache/incubator-druid/pull/6584] is to make couple of 
local fields as ThreadLocals.

Idea looks good to me and doesn't have any perf drawbacks.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20875) Druid storage handler Kafka ingestion timestamp column name

2018-11-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20875:
-

 Summary: Druid storage handler Kafka ingestion timestamp column 
name
 Key: HIVE-20875
 URL: https://issues.apache.org/jira/browse/HIVE-20875
 Project: Hive
  Issue Type: Task
Reporter: slim bouguerra
Assignee: Nishant Bangarwa


This question brought to my attention that currently the Druid-Hive-Kafka 
ingestion is assuming that the Kafka Stream has to include a column called 
__time as timestamp column.
https://community.hortonworks.com/questions/226191/druid-kafka-ingestion-from-hive-hdp-30.html?childToView=227242#answer-227242

Looking at the code here seems confirming that 
https://github.com/apache/hive/blob/a51e6aeaf816bdeea5e91ba3a0fab8a31b3a496d/druid-handler/src/java/org/apache/hadoop/hive/druid/DruidStorageHandler.java#L301.

IMO this is a serious limitation, because user can not always guarantee that 
the kafka record always will contain a column called `__time`, we need to 
introduce a have a way to configure this, maybe via table property.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20869) Fix test results file

2018-11-05 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20869:
-

 Summary: Fix test results file
 Key: HIVE-20869
 URL: https://issues.apache.org/jira/browse/HIVE-20869
 Project: Hive
  Issue Type: Sub-task
  Components: kafka integration
Reporter: slim bouguerra
Assignee: slim bouguerra


seems like between the time test run and HIVE-20486 was merged new hive 
property was added 
{code}
discover.partitions true
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20813) udf to_epoch_milli need to support timestamp without time zone as well

2018-10-25 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20813:
-

 Summary: udf to_epoch_milli need to support timestamp without time 
zone as well
 Key: HIVE-20813
 URL: https://issues.apache.org/jira/browse/HIVE-20813
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently the following query will fail with a cast exception (tries to cast 
timestamp to timestamp with local timezone).
{code}
 select to_epoch_milli(current_timestamp)
{code}
As a simple fix we need to add support for timestamp object inspector.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20782) Cleaning some unused code

2018-10-19 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20782:
-

 Summary: Cleaning some unused code
 Key: HIVE-20782
 URL: https://issues.apache.org/jira/browse/HIVE-20782
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: Teddy Choi


Am making my way into the vectorize code and trying understand the APIs. Ran 
into this unused one, i guess it is not used anymore.

[~ashutoshc] maybe can explain as you are the main contributor to this file ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20768) Adding Tumbling Window UDF

2018-10-17 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20768:
-

 Summary: Adding Tumbling Window UDF
 Key: HIVE-20768
 URL: https://issues.apache.org/jira/browse/HIVE-20768
 Project: Hive
  Issue Type: New Feature
Reporter: slim bouguerra
Assignee: slim bouguerra


Goal is to provide a UDF that truncates a timestamp to a beginning of a 
tumbling window interval.
{code}
/**
 * Tumbling windows are a series of fixed-sized, non-overlapping and contiguous 
time intervals.
 * Tumbling windows are inclusive start exclusive end.
 * By default the beginning instant of fist window is Epoch 0 Thu Jan 01 
00:00:00 1970 UTC.
 * Optionally users may provide a different origin as a timestamp arg3.
 *
 * This an example of series of window with an interval of 5 seconds and origin 
Epoch 0 Thu Jan 01 00:00:00 1970 UTC:
 *
 *
 *   interval 1   interval 2interval 3
 *   Jan 01 00:00:00  Jan 01 00:00:05   Jan 01 00:00:10
 * 0 -- 4 : 5 --- 9: 10 --- 14
 *
 * This UDF rounds timestamp agr1 to the beginning of window interval where it 
belongs to.
 *
 */
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20736) Eagerly allocate Containers from Yarn

2018-10-12 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20736:
-

 Summary: Eagerly allocate Containers from Yarn
 Key: HIVE-20736
 URL: https://issues.apache.org/jira/browse/HIVE-20736
 Project: Hive
  Issue Type: Bug
  Components: llap, Tez
Affects Versions: 4.0.0
Reporter: slim bouguerra
Assignee: Sergey Shelukhin


According to [~sershe] HS2 interactive at startup-time tries to eagerly 
allocated what ever needed to execute queries. But currently this process is 
kind of broken.
AS of now HS2I starts and tries to allocate the resources but in cases there is 
not enough Yarn resources in the desired queue, HS2I will keep trying in the 
background forever and will not bubble this up as an issue. trying forever to 
allocate without signaling error  defeats the idea of eagerly allocate in my 
opinion. I think HS2I has to fail the start if after XX minutes can not eagerly 
allocate the minimum space needed to run the max concurrent query.
CC [~hagleitn]/[~t3rmin4t0r]  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20735) Address some of the review comments.

2018-10-12 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20735:
-

 Summary: Address some of the review comments.
 Key: HIVE-20735
 URL: https://issues.apache.org/jira/browse/HIVE-20735
 Project: Hive
  Issue Type: Sub-task
  Components: kafka integration
Reporter: slim bouguerra
Assignee: slim bouguerra


As part of the review comments we agreed to:
# remove start and end offsets columns
# remove the best effort mode
# make the 2pc as default protocol for EOS

Also this patch will include an additional enhancement to add kerberos support.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20639) Add ability to Write Data from Hive Table/Query to Kafka Topic

2018-09-26 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20639:
-

 Summary: Add ability to Write Data from Hive Table/Query to Kafka 
Topic
 Key: HIVE-20639
 URL: https://issues.apache.org/jira/browse/HIVE-20639
 Project: Hive
  Issue Type: New Feature
  Components: kafka integration
Reporter: slim bouguerra
Assignee: slim bouguerra


This patch adds multiple record writers to allow Hive user writing data 
directly to a Kafka Topic.
The writer provides multiple write semantics modes.
* A None where all the records will be delivered with no guarantee or reties.
* B At_least_once, each record will be delivered with retries from the Kafka 
Producer and Hive Write Task. 
* C Exactly_once , Writer will be using Kafka Transaction API to ensure that 
each record is delivered once.

In addition to the new feature i have refactored the existing code to make it 
more readable.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20561) Use the position of the Kafka Consumer to track progress instead of Consumer Records offsets

2018-09-14 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20561:
-

 Summary: Use the position of the Kafka Consumer to track progress 
instead of Consumer Records offsets
 Key: HIVE-20561
 URL: https://issues.apache.org/jira/browse/HIVE-20561
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 4.0.0


Kafka Partitions with transactional messages (post 0.11) will include commit or 
abort markers which indicate the result of a transaction. The markers are not 
returned to applications, yet have an offset in the log. Therefore the end of 
Stream position can be the offset of a control message. 
This Patch change the way how we keep track of the consumer position by using  
{code} consumer.position(topicP) {code} as oppose to using the offset of the 
consumed messages.
Also I have done some refactoring to help code readability  hopefully.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20485) Test Storage Handler with Secured Kafka Cluster

2018-08-29 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20485:
-

 Summary: Test Storage Handler with Secured Kafka Cluster
 Key: HIVE-20485
 URL: https://issues.apache.org/jira/browse/HIVE-20485
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
Assignee: slim bouguerra


Need to test this with Secured Kafka Cluster.
* Kerberos
* SSL support



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20481) Add the Kafka Key record as part of the row.

2018-08-28 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20481:
-

 Summary: Add the Kafka Key record as part of the row.
 Key: HIVE-20481
 URL: https://issues.apache.org/jira/browse/HIVE-20481
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
Assignee: slim bouguerra


Kafka records are keyed, most of the case this key is null or used to route 
records to the same partition. This patch adds this column as a binary column 
{code} __record_key{code}.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20427) Remove Druid Mock tests from CliDrive

2018-08-20 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20427:
-

 Summary: Remove Druid Mock tests from CliDrive 
 Key: HIVE-20427
 URL: https://issues.apache.org/jira/browse/HIVE-20427
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: slim bouguerra


as per comment 
https://issues.apache.org/jira/browse/HIVE-20425?focusedCommentId=16586272=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16586272
We do not need to run those Mock Druid tests anymore, since 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver cover most of this cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20426) Upload Druid Test Runner logs from Build Slaves

2018-08-20 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20426:
-

 Summary: Upload Druid Test Runner logs from Build Slaves
 Key: HIVE-20426
 URL: https://issues.apache.org/jira/browse/HIVE-20426
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: Vineet Garg


Currently only hive log is uploaded from "hive/itests/qtest/tmp/log/"
It will be very valuable if we can add the following Druid logs
* coordinator.log
* broker.log
* historical.log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20425) Use a custom range of port for embedded Derby used by Druid.

2018-08-20 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20425:
-

 Summary: Use a custom range of port for embedded Derby used by 
Druid.
 Key: HIVE-20425
 URL: https://issues.apache.org/jira/browse/HIVE-20425
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


Seems like good amount of the flakiness of Druid Tests is due to port collision 
between Derby used by Hive and the one used by Druid. 
The goal of this Patch is to use a custom range 60_000 to 65535 and find the 
first available to be used by Druid Derby process.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20377) Hive Kafka Storage Handler

2018-08-13 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20377:
-

 Summary: Hive Kafka Storage Handler
 Key: HIVE-20377
 URL: https://issues.apache.org/jira/browse/HIVE-20377
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: slim bouguerra
Assignee: slim bouguerra


h1. Goal
* Read streaming data form Kafka queue as an external table.
* Allow streaming navigation by pushing down filters on Kafka record partition 
id, offset and timestamp. 
* Insert streaming data form Kafka to an actual Hive internal table, using CTAS 
statement.
h1. Example
h2. Create the external table
{code} 
CREATE EXTERNAL TABLE kafka_table (`timestamp` timestamps, page string, `user` 
string, language string, added int, deleted int, flags string,comment string, 
namespace string)
STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
TBLPROPERTIES 
("kafka.topic" = "wikipedia", 
"kafka.bootstrap.servers"="brokeraddress:9092",
"kafka.serde.class"="org.apache.hadoop.hive.serde2.JsonSerDe");
{code}
h2. Kafka Metadata
In order to keep track of Kafka records the storage handler will add 
automatically the Kafka row metadata eg partition id, record offset and record 
timestamp. 
{code}
DESCRIBE EXTENDED kafka_table

timestamp   timestamp   from deserializer   
pagestring  from deserializer   
userstring  from deserializer   
languagestring  from deserializer   
country string  from deserializer   
continent   string  from deserializer   
namespace   string  from deserializer   
newpage boolean from deserializer   
unpatrolled boolean from deserializer   
anonymous   boolean from deserializer   
robot   boolean from deserializer   
added   int from deserializer   
deleted int from deserializer   
delta   bigint  from deserializer   
__partition int from deserializer   
__offsetbigint  from deserializer   
__timestamp bigint  from deserializer   

{code}

h2. Filter push down.
Newer Kafka consumers 0.11.0 and higher allow seeking on the stream based on a 
given offset. The proposed storage handler will be able to leverage such API by 
pushing down filters over metadata columns, namely __partition (int), 
__offset(long) and __timestamp(long)
For instance Query like
{code} 
select `__offset` from kafka_table where (`__offset` < 10 and `__offset`>3 and 
`__partition` = 0) or (`__partition` = 0 and `__offset` < 105 and `__offset` > 
99) or (`__offset` = 109);
{code}
Will result on a scan of partition 0 only then read only records between offset 
4 and 109. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20376) Timestamp Timezone parser dosen't handler ISO formats "2013-08-31T01:02:33Z"

2018-08-13 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20376:
-

 Summary: Timestamp Timezone parser dosen't handler ISO formats 
"2013-08-31T01:02:33Z"
 Key: HIVE-20376
 URL: https://issues.apache.org/jira/browse/HIVE-20376
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


It will be nice to add ISO formats to timezone utils parser to handler the 
following  "2013-08-31T01:02:33Z"
org.apache.hadoop.hive.common.type.TimestampTZUtil#parse(java.lang.String)
CC [~jcamachorodriguez]/ [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20375) Json SerDe ignoring the timestamp.formats property

2018-08-13 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20375:
-

 Summary: Json SerDe ignoring the timestamp.formats property
 Key: HIVE-20375
 URL: https://issues.apache.org/jira/browse/HIVE-20375
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: slim bouguerra


JsonSerd is supposed to accept "timestamp.formats" SerDe property to allow 
different timestamp formats, after recent refactor I see that this is not 
working anymore.

Looking at the code I can see that The serde is not using the constructed 
parser with added format 
https://github.com/apache/hive/blob/1105ef3974d8a324637d3d35881a739af3aeb382/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonStructReader.java#L82

But instead it is using Converter
https://github.com/apache/hive/blob/1105ef3974d8a324637d3d35881a739af3aeb382/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonStructReader.java#L324

Then converter is using 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter.TimestampConverter

This converter does not have any knowledge about user formats or what so ever...
It is using this static converter 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils#getTimestampFromString



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20094) Update Druid to 0.12.1 version

2018-07-05 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20094:
-

 Summary: Update Druid to 0.12.1 version
 Key: HIVE-20094
 URL: https://issues.apache.org/jira/browse/HIVE-20094
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


As per Jira title.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19923) Follow up of HIVE-19615, use UnaryFunction instead of prefix

2018-06-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19923:
-

 Summary: Follow up of HIVE-19615, use UnaryFunction instead of 
prefix
 Key: HIVE-19923
 URL: https://issues.apache.org/jira/browse/HIVE-19923
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra


Correct usage of Druid isnull function is {code} isnull(exp){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19879) Remove unused calcite sql operator.

2018-06-13 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19879:
-

 Summary: Remove unused calcite sql operator.
 Key: HIVE-19879
 URL: https://issues.apache.org/jira/browse/HIVE-19879
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


HIVE-19796 introduced by mistake an unused sql operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19869) Remove double formatting bug followup of HIVE-19382

2018-06-12 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19869:
-

 Summary: Remove double formatting bug followup of HIVE-19382
 Key: HIVE-19869
 URL: https://issues.apache.org/jira/browse/HIVE-19869
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


HIVE-19382 has a minor bug that happens when users provide custom format as 
part of FROM_UNIXTIMESTAMP function.
Here is an example query
{code}
SELECT SUM(`ssb_druid_100`.`lo_revenue`) AS `sum_lo_revenue_ok`,
CAST(FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(`ssb_druid_100`.`__time` AS TIMESTAMP)), 
'-MM-dd HH:00:00') AS TIMESTAMP) AS `thr___time_ok`
 FROM `druid_ssb`.`ssb_druid_100` `ssb_druid_100`
GROUP BY CAST(FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(`ssb_druid_100`.`__time` AS 
TIMESTAMP)), '-MM-dd HH:00:00') AS TIMESTAMP);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19868) Extract support for float aggregator

2018-06-12 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19868:
-

 Summary: Extract support for float aggregator
 Key: HIVE-19868
 URL: https://issues.apache.org/jira/browse/HIVE-19868
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19796) Push Down TRUNC Fn to Druid Storage Handler

2018-06-05 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19796:
-

 Summary: Push Down TRUNC Fn to Druid Storage Handler
 Key: HIVE-19796
 URL: https://issues.apache.org/jira/browse/HIVE-19796
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


Push down Queries with TRUNC date function such as 
{code}
SELECT SUM((`ssb_druid_100`.`discounted_price` * 
`ssb_druid_100`.`net_revenue`)) AS `sum_calculation_4998925219892510720_ok`,
  CAST(TRUNC(CAST(`ssb_druid_100`.`__time` AS TIMESTAMP),'MM') AS DATE) AS 
`tmn___time_ok`
FROM `druid_ssb`.`ssb_druid_100` `ssb_druid_100`
GROUP BY CAST(TRUNC(CAST(`ssb_druid_100`.`__time` AS TIMESTAMP),'MM') AS DATE)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19721) Druid Storage handler throws exception when query has a Cast to Date

2018-05-26 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19721:
-

 Summary: Druid Storage handler throws exception when query has a 
Cast to Date 
 Key: HIVE-19721
 URL: https://issues.apache.org/jira/browse/HIVE-19721
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.1


{code}
SELECT CAST(`ssb_druid_100`.`__time` AS DATE) AS `x_time`,
SUM(`ssb_druid_100`.`metric_c`) AS `sum_lo_revenue_ok`
FROM `default`.`druid_test_table` `ssb_druid_100`
GROUP BY CAST(`ssb_druid_100`.`__time` AS DATE);
{code}

{code}
2018-05-26T06:54:56,570 DEBUG [HttpClient-Netty-Worker-5] 
client.NettyHttpClient: [POST http://localhost:8082/druid/v2/] Got chunk: 0B, 
last=true
2018-05-26T06:54:56,572 ERROR [1917f624-7b94-4990-9e3a-bbfff3656365 main] 
CliDriver: Failed with exception 
java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Unknown type: 
DATE
java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Unknown 
type: DATE
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:602)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:509)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2509)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1514)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1488)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:177)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
at 
org.apache.hadoop.hive.cli.TestMiniDruidLocalCliDriver.testCliDriver(TestMiniDruidLocalCliDriver.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:92)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runners.Suite.runChild(Suite.java:127)
at org.junit.runners.Suite.runChild(Suite.java:26)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:73)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 

[jira] [Created] (HIVE-19695) Year Month Day extraction functions need to add an implicit cast for column that are String types

2018-05-24 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19695:
-

 Summary: Year Month Day extraction functions need to add an 
implicit cast for column that are String types
 Key: HIVE-19695
 URL: https://issues.apache.org/jira/browse/HIVE-19695
 Project: Hive
  Issue Type: Bug
  Components: Druid integration, Query Planning
Affects Versions: 3.0.0
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.1.0


To avoid surprising/wrong results, Hive Query plan shall add an explicit cast 
over non date/timestamp column type when user try to extract Year/Month/Hour 
etc..
This is an example of misleading results.
{code}
create table test_base_table(`timecolumn` timestamp, `date_c` string, 
`timestamp_c` string,  `metric_c` double);
insert into test_base_table values ('2015-03-08 00:00:00', '2015-03-10', 
'2015-03-08 00:00:00', 5.0);
CREATE TABLE druid_test_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS select
cast(`timecolumn` as timestamp with local time zone) as `__time`, `date_c`, 
`timestamp_c`, `metric_c` FROM test_base_table;
select
year(date_c), month(date_c),day(date_c), hour(date_c),
year(timestamp_c), month(timestamp_c),day(timestamp_c), hour(timestamp_c)
from druid_test_table;
{code} 

will return the following wrong results:
{code}
PREHOOK: query: select
year(date_c), month(date_c),day(date_c), hour(date_c),
year(timestamp_c), month(timestamp_c),day(timestamp_c), hour(timestamp_c)
from druid_test_table
PREHOOK: type: QUERY
PREHOOK: Input: default@druid_test_table
 A masked pattern was here 
POSTHOOK: query: select
year(date_c), month(date_c),day(date_c), hour(date_c),
year(timestamp_c), month(timestamp_c),day(timestamp_c), hour(timestamp_c)
from druid_test_table
POSTHOOK: type: QUERY
POSTHOOK: Input: default@druid_test_table
 A masked pattern was here 
196912  31  16  196912  31  16 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19684) Hive stats optimizer wrongly uses stats against non native tables

2018-05-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19684:
-

 Summary: Hive stats optimizer wrongly uses stats against non 
native tables
 Key: HIVE-19684
 URL: https://issues.apache.org/jira/browse/HIVE-19684
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Stats of non native tables are inaccurate, thus queries over non native tables 
can not optimized by stats optimizer.
Take example of query 
{code}
Explain select count(*) from (select `__time` from druid_test_table limit 1) as 
src ;
{code} 

the plan will be reduced to 
{code}
POSTHOOK: query: explain extended select count(*) from (select `__time` from 
druid_test_table limit 1) as src
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
  Stage-0 is a root stage
STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: 1
  Processor Tree:
ListSink
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19680) Push down limit is not applied for Druid storage handler.

2018-05-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19680:
-

 Summary: Push down limit is not applied for Druid storage handler.
 Key: HIVE-19680
 URL: https://issues.apache.org/jira/browse/HIVE-19680
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


Query like 
{code}
select `__time` from druid_test_table limit 1;
{code}
returns more than one row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19675) Cast to timestamps on Druid time column leads to an exception

2018-05-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19675:
-

 Summary: Cast to timestamps on Druid time column leads to an 
exception
 Key: HIVE-19675
 URL: https://issues.apache.org/jira/browse/HIVE-19675
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 3.0.0
Reporter: slim bouguerra
Assignee: Jesus Camacho Rodriguez


The following query fail due to a formatting issue.
{code}
SELECT CAST(`ssb_druid_100`.`__time` AS TIMESTAMP) AS `x_time`,
. . . . . . . . . . . . . . . .>   SUM(`ssb_druid_100`.`lo_revenue`) AS 
`sum_lo_revenue_ok`
. . . . . . . . . . . . . . . .> FROM `druid_ssb`.`ssb_druid_100` 
`ssb_druid_100`
. . . . . . . . . . . . . . . .> GROUP BY CAST(`ssb_druid_100`.`__time` AS 
TIMESTAMP);
{code} 
Exception
{code} 
Error: java.io.IOException: java.lang.NumberFormatException: For input string: 
"1991-12-31 19:00:00" (state=,code=0)
{code}
[~jcamachorodriguez] maybe this is fixed by your upcoming patches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19674) Group by Decimal Constants push down to Druid tables.

2018-05-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19674:
-

 Summary: Group by Decimal Constants push down to Druid tables.
 Key: HIVE-19674
 URL: https://issues.apache.org/jira/browse/HIVE-19674
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Queries like following gets generated by Tableau.
{code}
SELECT SUM(`ssb_druid_100`.`lo_revenue`) AS `sum_lo_revenue_ok`
 FROM `druid_ssb`.`ssb_druid_100` `ssb_druid_100`
GROUP BY 1.1001;
{code}

The Group key is pushed down to Druid as a Constant Column, this leads to an 
Exception while parsing back the results since Druid Input format does not 
allow Decimals.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19672) Column Names mismatch between native Druid Tables and Hive External map

2018-05-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19672:
-

 Summary: Column Names mismatch between native Druid Tables and 
Hive External map
 Key: HIVE-19672
 URL: https://issues.apache.org/jira/browse/HIVE-19672
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 3.0.0
Reporter: slim bouguerra
 Fix For: 4.0.0


Druid Columns names are case sensitive while Hive is case insensitive.
This implies that any Druid Datasource that has columns with some upper cases 
as part of column name it will not return the expected results.
One possible fix is to try to remap the column names before issuing Json Query 
to Druid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19615) Proper handling of is null and not is null predicate when pushed to Druid

2018-05-18 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19615:
-

 Summary: Proper handling of is null and not is null predicate when 
pushed to Druid
 Key: HIVE-19615
 URL: https://issues.apache.org/jira/browse/HIVE-19615
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


Recent development in Druid introduced new semantic of null handling 
[here|https://github.com/b-slim/druid/commit/219e77aeac9b07dc20dd9ab2dd537f3f17498346]

Based on those changes when need to honer push down of expressions with is 
null/ is not null predicates.
The prosed fix overrides the mapping of Calcite Function to Druid Expression to 
much the correct semantic.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19607) Pushing Aggregates on Top of Aggregates

2018-05-18 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19607:
-

 Summary: Pushing Aggregates on Top of Aggregates
 Key: HIVE-19607
 URL: https://issues.apache.org/jira/browse/HIVE-19607
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
 Fix For: 3.1.0


This plan shows an instance where the count aggregates can be pushed to Druid 
which will eliminate the last stage reducer.

{code}
+PREHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM 
druid_table
+PREHOOK: type: QUERY
+POSTHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM 
druid_table
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-1
+Tez
+ A masked pattern was here 
+  Edges:
+Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
+ A masked pattern was here 
+  Vertices:
+Map 1
+Map Operator Tree:
+TableScan
+  alias: druid_table
+  properties:
+druid.fieldNames cstring2,$f1
+druid.fieldTypes string,double
+druid.query.json 
{"queryType":"groupBy","dataSource":"default.druid_table","granularity":"all","dimensions":[{"type":"default","dimension":"cstring2","outputName":"cstring2","outputType":"STRING"}],"limitSpec":{"type":"default"},"aggregations":[{"type":"doubleSum","name":"$f1","fieldName":"cdouble"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"]}
+druid.query.type groupBy
+  Statistics: Num rows: 9173 Data size: 1673472 Basic stats: 
COMPLETE Column stats: NONE
+  Select Operator
+expressions: cstring2 (type: string), $f1 (type: double)
+outputColumnNames: cstring2, $f1
+Statistics: Num rows: 9173 Data size: 1673472 Basic stats: 
COMPLETE Column stats: NONE
+Group By Operator
+  aggregations: count(cstring2), sum($f1)
+  mode: hash
+  outputColumnNames: _col0, _col1
+  Statistics: Num rows: 1 Data size: 208 Basic stats: 
COMPLETE Column stats: NONE
+  Reduce Output Operator
+sort order:
+Statistics: Num rows: 1 Data size: 208 Basic stats: 
COMPLETE Column stats: NONE
+value expressions: _col0 (type: bigint), _col1 (type: 
double)
+Reducer 2
+Reduce Operator Tree:
+  Group By Operator
+aggregations: count(VALUE._col0), sum(VALUE._col1)
+mode: mergepartial
+outputColumnNames: _col0, _col1
+Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE 
Column stats: NONE
+File Output Operator
+  compressed: false
+  Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE 
Column stats: NONE
+  table:
+  input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
+  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19601) Unsupported Post join function

2018-05-17 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19601:
-

 Summary: Unsupported Post join function
 Key: HIVE-19601
 URL: https://issues.apache.org/jira/browse/HIVE-19601
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra


h1. As part of trying to use the Calcite rule {code} 
org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule#JOIN {code}
i got the following Calcite plan 
{code}
2018-05-17T09:26:02,781 DEBUG [80d6d405-ed78-4f60-bd93-b3e08e424f73 main] 
translator.PlanModifierForASTConv: Final plan after modifier
 HiveProject(_c0=[$1], _c1=[$2])
  HiveProject(zone=[$0], $f1=[$1], $f2=[$3])
HiveJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner], 
algorithm=[none], cost=[not available])
  HiveProject(zone=[$0], $f1=[$1])
HiveAggregate(group=[{0}], agg#0=[count($1)])
  HiveProject(zone=[$0], interval_marker=[$1])
HiveAggregate(group=[{0, 1}])
  HiveProject(zone=[$3], interval_marker=[$1])
HiveTableScan(table=[[druid_test_dst.test_base_table]], 
table:alias=[test_base_table])
  HiveProject(zone=[$0], $f1=[$1])
HiveAggregate(group=[{0}], agg#0=[count($1)])
  HiveProject(zone=[$0], dim=[$1])
HiveAggregate(group=[{0, 1}])
  HiveProject(zone=[$3], dim=[$4])
HiveTableScan(table=[[druid_test_dst.test_base_table]], 
table:alias=[test_base_table])
{code}
I run into this issue
{code} 
2018-05-17T09:26:02,876 ERROR [80d6d405-ed78-4f60-bd93-b3e08e424f73 main] 
parse.CalcitePlanner: CBO failed, skipping CBO.
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid function 
'IS NOT DISTINCT FROM'
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1069)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1464)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19600) Hive and Calcite have different semantics for Grouping sets

2018-05-17 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19600:
-

 Summary: Hive and Calcite have different semantics for Grouping 
sets
 Key: HIVE-19600
 URL: https://issues.apache.org/jira/browse/HIVE-19600
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
 Fix For: 3.1.0


h1. Issue:
Tried to use the calcite rule {code} 
org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule#AggregateExpandDistinctAggregatesRule(java.lang.Class, boolean, 
org.apache.calcite.tools.RelBuilderFactory) {code} to replace current rule used 
by Hive {code} 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule#HiveExpandDistinctAggregatesRule
{code}
But i got an exception when generating the Operator tree out of calcite plan.
This is the Calcite plan 
{code} 
HiveProject.HIVE.[](input=rel#50:HiveAggregate.HIVE.[](input=rel#48:HiveProject.HIVE.[](input=rel#44:HiveAggregate.HIVE.[](input=rel#38:HiveProject.HIVE.[](input=rel#0:HiveTableScan.HIVE.[]
(table=[druid_test_dst.test_base_table],table:alias=test_base_table)[false],$f0=$3,$f1=$1,$f2=$4),group={0,
 1, 2},groups=[{0, 1}, {0, 2}],$g=GROUPING($0, $1, 
$2)),$f0=$0,$f1=$1,$f2=$2,$g_1==($3, 1),$g_2==($3, 
2)),group={0},agg#0=count($1) FILTER $3,agg#1=count($2) FILTER 
$4),_o__c0=$1,_o__c1=$2)
{code}

This is the exception stack 
{code} 
2018-05-17T08:46:48,604 ERROR [649a61b0-d8c7-45d8-962d-b1d38397feb4 main] 
ql.Driver: FAILED: SemanticException Line 0:-1 Argument type mismatch 'zone': 
The first argument to grouping() must be an int/long. Got: STRING
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Argument type 
mismatch 'zone': The first argument to grouping() must be an int/long. Got: 
STRING
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1467)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:239)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:185)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12566)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12521)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4525)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4298)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10487)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10426)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11339)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11196)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11223)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11209)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:517)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12074)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:164)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:643)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1686)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1633)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1628)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at 

[jira] [Created] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-17 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19586:
-

 Summary: Optimize Count(distinct X) pushdown based on the storage 
capabilities 
 Key: HIVE-19586
 URL: https://issues.apache.org/jira/browse/HIVE-19586
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration, Logical Optimizer
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


h1. Goal
Provide a way to rewrite queries with combination of COUNT(Distinct) and 
Aggregates like SUM as a series of Group By.
This can be useful to push down to Druid queries like 
{code}
 select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) FROM 
druid_test_table GROUP  BY `__time`, `zone` ;
{code}
In general this can be useful to be used in cases where storage handlers can 
not perform count (distinct column)

h1. How to do it.
Use the Calcite rule {code} 
org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
breaks down Count distinct to a single Group by with Grouping sets or multiple 
series of Group by that might be linked with Joins if multiple counts are 
present.
FYI today Hive does have a similar rule {code} 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code},
 but it only provides a rewrite to Grouping sets based plan.
I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or 
caveats to be aware of?

h2. Concerns/questions
Need to have a way to switch between Grouping sets or Simple chained group by 
based on the plan cost. For instance for Druid based scan it makes always sense 
(at least today) to push down a series of Group by and stitch result sets in 
Hive later (as oppose to scan everything). 
But this might be not true for other storage handler that can handle Grouping 
sets it is better to push down the Grouping sets as one table scan.
Am still unsure how i can lean on the cost optimizer to select the best plan, 
[~ashutoshc]/[~jcamachorodriguez] any inputs?






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19490) Locking on Insert into for non native and managed tables.

2018-05-10 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19490:
-

 Summary: Locking on Insert into for non native and managed tables.
 Key: HIVE-19490
 URL: https://issues.apache.org/jira/browse/HIVE-19490
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra


Current state of the art: 

Managed non native table like Druid Tables, will need to get a Lock on Insert 
into or insert Over write. The nature of this lock is set to Exclusive by 
default for any non native table.

This implies that Inserts into Druid table will Lock any read query as well 
during the execution of the insert into. IMO this lock (on insert into) is  not 
needed since the insert statement is appending data and the state of loading it 
is managed partially by Hive Storage handler hook and part of it by Druid. 

What i am proposing is to relax the lock level to shared for all non native 
tables on insert into operations and keep it as Exclusive Write for insert 
Overwrite for now.

 

Any feedback is welcome.

cc [~ekoifman] / [~ashutoshc] / [~jdere] / [~hagleitn]

Also am not sure what is the best way to unit test this currently am using 
debugger to check of locks are what i except, please let me know if there is a 
better way to do this. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.

2018-05-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19474:
-

 Summary: Decimal type should be casted as part of the CTAS or 
INSERT Clause.
 Key: HIVE-19474
 URL: https://issues.apache.org/jira/browse/HIVE-19474
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


HIVE-18569  introduced a runtime config variable to allow the indexing of 
Decimal as Double, this leads to kind of messy state, Hive metadata think the 
column is still decimal while it is stored as double. Since the Hive metadata 
of the column is Decimal the logical optimizer will not push down aggregates. i 
tried to fix this by adding some logic to the application but it makes the code 
very clumsy with lot of branches. Instead i propose to revert this patch and 
let the user introduce an explicit cast this will be better since the metada 
reflects actual storage type and push down aggregates will kick in and there is 
no config needed.

cc [~ashutoshc] and [~nishantbangarwa]

You can see the difference with the following DDL

{code}

create table test_base_table(`timecolumn` timestamp, `interval_marker` string, 
`num_l` DECIMAL(10,2));
insert into test_base_table values ('2015-03-08 00:00:00', 'i1-start', 4.5);
set hive.druid.approx.result=true;
CREATE TABLE druid_test_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
`interval_marker`, cast(`num_l` as double)
FROM test_base_table;
describe druid_test_table;
explain select sum(num_l), min(num_l) FROM druid_test_table;
CREATE TABLE druid_test_table_2
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
`interval_marker`, `num_l`
FROM test_base_table;
describe druid_test_table_2;
explain select sum(num_l), min(num_l) FROM druid_test_table_2;

{code}

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19462) Fix mapping for char_length function to enable pushdown to Druid.

2018-05-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19462:
-

 Summary: Fix mapping for char_length function to enable pushdown 
to Druid. 
 Key: HIVE-19462
 URL: https://issues.apache.org/jira/browse/HIVE-19462
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


currently char_length is not push down to Druid because of missing mapping 
form/to calcite

This patch will add this mapping.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19443) Issue with Druid timestamp with timezone handling

2018-05-07 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19443:
-

 Summary: Issue with Druid timestamp with timezone handling
 Key: HIVE-19443
 URL: https://issues.apache.org/jira/browse/HIVE-19443
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
 Attachments: test_resutls.out, test_timestamp.q

As you can see at the attached file [^test_resutls.out] when switching current 
timezone to UTC the insert of values from Hive table into Druid table does miss 
some rows.

You can use this to reproduce it.

[^test_timestamp.q]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19441) Add support for float aggregator and use LLAP test Driver

2018-05-07 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19441:
-

 Summary: Add support for float aggregator and use LLAP test Driver
 Key: HIVE-19441
 URL: https://issues.apache.org/jira/browse/HIVE-19441
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Adding support to the float kind aggregator.

Use LLAP as test Driver to reduce execution time of tests from about 2 hours to 
15 min:

Although this patches unveiling an issue with timezone, maybe it is fixed by 
[~jcamachorodriguez] upcoming set of patches.

 

Before

{code}

[INFO] Executed tasks
[INFO]
[INFO] --- maven-compiler-plugin:3.6.1:testCompile (default-testCompile) @ 
hive-it-qfile ---
[INFO] Compiling 21 source files to 
/Users/sbouguerra/Hdev/hive/itests/qtest/target/test-classes
[INFO]
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hive-it-qfile ---
[INFO]
[INFO] ---
[INFO] T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hive.cli.TestMiniDruidCliDriver
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
6,654.117 s - in org.apache.hadoop.hive.cli.TestMiniDruidCliDriver
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 01:51 h
[INFO] Finished at: 2018-05-04T12:43:19-07:00
[INFO] 

{code}

After

{code}

INFO] Executed tasks
[INFO]
[INFO] --- maven-compiler-plugin:3.6.1:testCompile (default-testCompile) @ 
hive-it-qfile ---
[INFO] Compiling 22 source files to 
/Users/sbouguerra/Hdev/hive/itests/qtest/target/test-classes
[INFO]
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hive-it-qfile ---
[INFO]
[INFO] ---
[INFO] T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hive.cli.TestMiniDruidCliDriver
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 907.167 
s - in org.apache.hadoop.hive.cli.TestMiniDruidCliDriver
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 15:31 min
[INFO] Finished at: 2018-05-04T13:15:11-07:00
[INFO] 

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19298) Fix operator tree of CTAS for Druid Storage Handler

2018-04-25 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19298:
-

 Summary: Fix operator tree of CTAS for Druid Storage Handler
 Key: HIVE-19298
 URL: https://issues.apache.org/jira/browse/HIVE-19298
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.1.0


Current operator plan of CTAS for Druid storage handler is broken when used 
enables the property \{code} hive.exec.parallel\{code} as \{code} true\{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19239) Check for possible null timestamp fields during SerDe from Druid events

2018-04-18 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19239:
-

 Summary: Check for possible null timestamp fields during SerDe 
from Druid events
 Key: HIVE-19239
 URL: https://issues.apache.org/jira/browse/HIVE-19239
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently we do not check for possible null timestamp events.

This might lead to NPE.

This Patch add addition check for such case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19187) Update Druid Storage Handler to Druid 0.12.0

2018-04-11 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19187:
-

 Summary: Update Druid Storage Handler to Druid 0.12.0
 Key: HIVE-19187
 URL: https://issues.apache.org/jira/browse/HIVE-19187
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.1.0


Current used Druid Version is 0.11.0
This Patch updates the Druid version to the most recent version 0.12.0




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19157) Assert that Insert into Druid Table it fails.

2018-04-10 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19157:
-

 Summary: Assert that Insert into Druid Table it fails. 
 Key: HIVE-19157
 URL: https://issues.apache.org/jira/browse/HIVE-19157
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


The usual work flow of loading Data into Druid relies on the fact that HS2 is 
able to load Segments metadata from HDFS that are produced by LLAP/TEZ works.
In some cases where HS2 is not able to perform `ls` on the HDFS path the insert 
into query will return success and will not insert any data.
This bug was introduced at function {code} 
org.apache.hadoop.hive.druid.DruidStorageHandlerUtils#getCreatedSegments{code} 
when we added feature to allow create empty tables.
{code}
 try {
  fss = fs.listStatus(taskDir);
} catch (FileNotFoundException e) {
  // This is a CREATE TABLE statement or query executed for CTAS/INSERT
  // did not produce any result. We do not need to do anything, this is
  // expected behavior.
  return publishedSegmentsBuilder.build();
}
{code}

Am still looking for the way to fix this, [~jcamachorodriguez]/[~ashutoshc] any 
idea what is the best way to detect that it is an empty create table statement? 

 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19155) Day time saving cause Druid inserts to fail with org.apache.hive.druid.io.druid.java.util.common.UOE: Cannot add overlapping segments

2018-04-10 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19155:
-

 Summary: Day time saving cause Druid inserts to fail with 
org.apache.hive.druid.io.druid.java.util.common.UOE: Cannot add overlapping 
segments
 Key: HIVE-19155
 URL: https://issues.apache.org/jira/browse/HIVE-19155
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


If you try to insert data around the daylight saving time hour the query fails 
with following exception
{code}
2018-04-10T11:24:58,836 ERROR [065fdaa2-85f9-4e49-adaf-3dc14d51be90 main] 
exec.DDLTask: Failed
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hive.druid.io.druid.java.util.common.UOE: Cannot add overlapping 
segments [2015-03-08T05:00:00.000Z/2015-03-09T05:00:00.000Z and 
2015-03-09T04:00:00.000Z/2015-03-10T04:00:00.000Z] with the same version 
[2018-04-10T11:24:48.388-07:00]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:914) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:919) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4831) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:394) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2443) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2114) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1797) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1538) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1532) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:204) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) 
[hive-cli-3.1.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) 
[hive-cli-3.1.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) 
[hive-cli-3.1.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) 
[hive-cli-3.1.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1455) 
[hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1429) 
[hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:177)
 [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) 
[hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59)
 [test-classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_92]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_92]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_92]
{code}

You can reproduce this using the following DDL 
{code}
create database druid_test;
use druid_test;

create table test_table(`timecolumn` timestamp, `userid` string, `num_l` float);

insert into test_table values ('2015-03-08 00:00:00', 'i1-start', 4);
insert into test_table values ('2015-03-08 23:59:59', 'i1-end', 1);

insert into test_table values ('2015-03-09 00:00:00', 'i2-start', 4);
insert into test_table values ('2015-03-09 23:59:59', 'i2-end', 1);

insert into test_table values ('2015-03-10 00:00:00', 'i3-start', 2);
insert into test_table values ('2015-03-10 23:59:59', 'i3-end', 2);

CREATE TABLE druid_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
`userid`, `num_l` FROM test_table;
{code}

The fix is to always adjust the Druid segments identifiers to UTC.





[jira] [Created] (HIVE-19070) Add More Test To Druid Mini Cluster 200 Tableau kind queries.

2018-03-28 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19070:
-

 Summary: Add More Test To Druid Mini Cluster 200 Tableau kind 
queries.
 Key: HIVE-19070
 URL: https://issues.apache.org/jira/browse/HIVE-19070
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


In This patch am adding 200 new tableau query that runs over a new Data set 
called calcs.
The data set is very small.
I also have consolidated 3 different tests to run as one test this will help 
with keeping execution time low.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19044) Duplicate field names within Druid Query Generated by Calcite plan

2018-03-25 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19044:
-

 Summary: Duplicate field names within Druid Query Generated by 
Calcite plan
 Key: HIVE-19044
 URL: https://issues.apache.org/jira/browse/HIVE-19044
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


This is the Query plan as you can see "$f4" is duplicated.
{code}
PREHOOK: query: EXPLAIN SELECT Calcs.key AS none_key_nk,   SUM(Calcs.num0) AS 
temp_z_stdevp_num0___1723718801__0_,   COUNT(Calcs.num0) AS 
temp_z_stdevp_num0___2730138885__0_,   SUM((Calcs.num0 * Calcs.num0)) AS 
temp_z_stdevp_num0___4071133194__0_,   STDDEV_POP(Calcs.num0) AS stp_num0_ok 
FROM druid_tableau.calcs Calcs GROUP BY Calcs.key
PREHOOK: type: QUERY
POSTHOOK: query: EXPLAIN SELECT Calcs.key AS none_key_nk,   SUM(Calcs.num0) AS 
temp_z_stdevp_num0___1723718801__0_,   COUNT(Calcs.num0) AS 
temp_z_stdevp_num0___2730138885__0_,   SUM((Calcs.num0 * Calcs.num0)) AS 
temp_z_stdevp_num0___4071133194__0_,   STDDEV_POP(Calcs.num0) AS stp_num0_ok 
FROM druid_tableau.calcs Calcs GROUP BY Calcs.key
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: calcs
  properties:
druid.fieldNames key,$f1,$f2,$f3,$f4
druid.fieldTypes string,double,bigint,double,double
druid.query.json 
{"queryType":"groupBy","dataSource":"druid_tableau.calcs","granularity":"all","dimensions":[{"type":"default","dimension":"key","outputName":"key","outputType":"STRING"}],"limitSpec":{"type":"default"},"aggregations":[{"type":"doubleSum","name":"$f1","fieldName":"num0"},{"type":"filtered","filter":{"type":"not","field":{"type":"selector","dimension":"num0","value":null}},"aggregator":{"type":"count","name":"$f2","fieldName":"num0"}},{"type":"doubleSum","name":"$f3","expression":"(\"num0\"
 * \"num0\")"},{"type":"doubleSum","name":"$f4","expression":"(\"num0\" * 
\"num0\")"}],"postAggregations":[{"type":"expression","name":"$f4","expression":"pow(((\"$f4\"
 - ((\"$f1\" * \"$f1\") / \"$f2\")) / 
\"$f2\"),0.5)"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"]}
druid.query.type groupBy
  Select Operator
expressions: key (type: string), $f1 (type: double), $f2 (type: 
bigint), $f3 (type: double), $f4 (type: double)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
ListSink
{code}
Table DDL 
{code}
create database druid_tableau;
use druid_tableau;
drop table if exists calcs;
create table calcs
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES (
  "druid.segment.granularity" = "MONTH",
  "druid.query.granularity" = "DAY")
AS SELECT
  cast(datetime0 as timestamp with local time zone) `__time`,
  key,
  str0, str1, str2, str3,
  date0, date1, date2, date3,
  time0, time1,
  datetime1,
  zzz,
  cast(bool0 as string) bool0,
  cast(bool1 as string) bool1,
  cast(bool2 as string) bool2,
  cast(bool3 as string) bool3,
  int0, int1, int2, int3,
  num0, num1, num2, num3, num4
from default.calcs_orc;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19023) Druid storage Handler still using old select query when the CBO fails

2018-03-22 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19023:
-

 Summary: Druid storage Handler still using old select query when 
the CBO fails
 Key: HIVE-19023
 URL: https://issues.apache.org/jira/browse/HIVE-19023
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


See usage of function {code} 
org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat#createSelectStarQuery{code}
this can be replaced by scan query that is more efficent.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19011) Druid Storage Handler returns conflicting results for Qtest druidmini_dynamic_partition.q

2018-03-21 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19011:
-

 Summary: Druid Storage Handler returns conflicting results for 
Qtest druidmini_dynamic_partition.q
 Key: HIVE-19011
 URL: https://issues.apache.org/jira/browse/HIVE-19011
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


This git diff shows the conflicting results
{code}
diff --git 
a/ql/src/test/results/clientpositive/druid/druidmini_dynamic_partition.q.out 
b/ql/src/test/results/clientpositive/druid/druidmini_dynamic_partition.q.out
index 714778ebfc..cea9b7535c 100644
--- a/ql/src/test/results/clientpositive/druid/druidmini_dynamic_partition.q.out
+++ b/ql/src/test/results/clientpositive/druid/druidmini_dynamic_partition.q.out
@@ -243,7 +243,7 @@ POSTHOOK: query: SELECT  sum(cint), max(cbigint),  
sum(cbigint), max(cint) FROM
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@druid_partitioned_table
 POSTHOOK: Output: hdfs://### HDFS PATH ###
-1408069801800  4139540644  10992545287 165393120
+1408069801800  3272553822  10992545287 -648527473
 PREHOOK: query: SELECT  sum(cint), max(cbigint),  sum(cbigint), max(cint) FROM 
druid_partitioned_table_0
 PREHOOK: type: QUERY
 PREHOOK: Input: default@druid_partitioned_table_0
@@ -429,7 +429,7 @@ POSTHOOK: query: SELECT sum(cint), max(cbigint),  
sum(cbigint), max(cint) FROM d
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@druid_partitioned_table
 POSTHOOK: Output: hdfs://### HDFS PATH ###
-2857395071862  4139540644  -1661313883124  885815256
+2857395071862  3728054572  -1661313883124  71894663
 PREHOOK: query: EXPLAIN INSERT OVERWRITE TABLE druid_partitioned_table
   SELECT cast (`ctimestamp1` as timestamp with local time zone) as `__time`,
 cstring1,
@@ -566,7 +566,7 @@ POSTHOOK: query: SELECT sum(cint), max(cbigint),  
sum(cbigint), max(cint) FROM d
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@druid_partitioned_table
 POSTHOOK: Output: hdfs://### HDFS PATH ###
-1408069801800  7115092987  10992545287 1232243564
+1408069801800  4584782821  10992545287 -1808876374
 PREHOOK: query: SELECT  sum(cint), max(cbigint),  sum(cbigint), max(cint) FROM 
druid_partitioned_table_0
 PREHOOK: type: QUERY
 PREHOOK: Input: default@druid_partitioned_table_0
@@ -659,7 +659,7 @@ POSTHOOK: query: SELECT sum(cint), max(cbigint),  
sum(cbigint), max(cint) FROM d
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@druid_partitioned_table
 POSTHOOK: Output: hdfs://### HDFS PATH ###
-1408069801800  7115092987  10992545287 1232243564
+1408069801800  4584782821  10992545287 -1808876374
 PREHOOK: query: EXPLAIN SELECT  sum(cint), max(cbigint),  sum(cbigint), 
max(cint)  FROM druid_max_size_partition
 PREHOOK: type: QUERY
 POSTHOOK: query: EXPLAIN SELECT  sum(cint), max(cbigint),  sum(cbigint), 
max(cint)  FROM druid_max_size_partition
@@ -758,7 +758,7 @@ POSTHOOK: query: SELECT sum(cint), max(cbigint),  
sum(cbigint), max(cint) FROM d
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@druid_partitioned_table
 POSTHOOK: Output: hdfs://### HDFS PATH ###
-1408069801800  7115092987  10992545287 1232243564
+1408069801800  4584782821  10992545287 -1808876374
 PREHOOK: query: DROP TABLE druid_partitioned_table_0
 PREHOOK: type: DROPTABLE
 PREHOOK: Input: default@druid_partitioned_table_0
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18996) SubString Druid convertor assuming that index is always constant literal value

2018-03-19 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18996:
-

 Summary: SubString Druid convertor assuming that index is always 
constant literal value
 Key: HIVE-18996
 URL: https://issues.apache.org/jira/browse/HIVE-18996
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


Query like the following 
{code}
SELECT substring(namespace, CAST(deleted AS INT), 4)
FROM druid_table_1;
{code}
will fail with 
{code}
java.lang.AssertionError: not a literal: $13
at org.apache.calcite.rex.RexLiteral.findValue(RexLiteral.java:963)
at org.apache.calcite.rex.RexLiteral.findValue(RexLiteral.java:955)
at org.apache.calcite.rex.RexLiteral.intValue(RexLiteral.java:938)
at 
org.apache.calcite.adapter.druid.SubstringOperatorConversion.toDruidExpression(SubstringOperatorConversion.java:46)
at 
org.apache.calcite.adapter.druid.DruidExpressions.toDruidExpression(DruidExpressions.java:120)
at 
org.apache.calcite.adapter.druid.DruidQuery.computeProjectAsScan(DruidQuery.java:746)
at 
org.apache.calcite.adapter.druid.DruidRules$DruidProjectRule.onMatch(DruidRules.java:308)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
{code}

because is assuming that index is always a constant literal. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18993) Use Druid Expressions

2018-03-19 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18993:
-

 Summary: Use Druid Expressions
 Key: HIVE-18993
 URL: https://issues.apache.org/jira/browse/HIVE-18993
 Project: Hive
  Issue Type: Task
Reporter: slim bouguerra






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18959) Avoid creating extra pool of threads within LLAP

2018-03-14 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18959:
-

 Summary: Avoid creating extra pool of threads within LLAP
 Key: HIVE-18959
 URL: https://issues.apache.org/jira/browse/HIVE-18959
 Project: Hive
  Issue Type: Task
  Components: Druid integration
 Environment: Kerberos Cluster
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


The current Druid-Kerberos-Http client is using an external single threaded 
pool to handle retry auth calls (eg when a cookie expire or other transient 
auth issues). 

First, this is not buying us anything since all the Druid Task is executed as 
one synchronous task.

Second, this can cause a major issue if an exception occurs that leads to 
shutting down the LLAP main thread.

 Thus to fix this we should avoid using an external thread pool and handle 
retrying in a synchronous way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18780) Improve schema discovery For Druid Storage Handler

2018-02-22 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18780:
-

 Summary: Improve schema discovery For Druid Storage Handler
 Key: HIVE-18780
 URL: https://issues.apache.org/jira/browse/HIVE-18780
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently, Druid Storage adapter issues a Segment metadata Query every time the 
query is of type Select or Scan. Not only that but then every input split (map) 
will do the same as well since it is using the same Serde, this is very 
expensive and put a lot of pressure on the Druid Cluster. The way to fix this 
is to add the schema out of the calcite plan instead of serializing the query 
itself as part of the Hive query context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18732) Push order/limit to Druid historical when approximate results are allowed

2018-02-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18732:
-

 Summary: Push order/limit to Druid historical when approximate 
results are allowed 
 Key: HIVE-18732
 URL: https://issues.apache.org/jira/browse/HIVE-18732
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra


Druid 0.11 allow force push down of Order by Limit to historicals using a 
context Query Flag \{code} forcePushDownLimit\{code}. 
[http://druid.io/docs/latest/querying/groupbyquery.html]

As per the docs 
[http://druid.io/docs/latest/querying/groupbyquery.html|http://druid.io/docs/latest/querying/groupbyquery.html.],
 this is a great optimization that can be used if the approximate results are 
allowed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18731) Add Documentations about this feature.

2018-02-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18731:
-

 Summary: Add Documentations about this feature. 
 Key: HIVE-18731
 URL: https://issues.apache.org/jira/browse/HIVE-18731
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra


need to add basic docs about new table properties and what it means in 
practice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18730) Use LLAP as execution engine for Druid mini Cluster Tests

2018-02-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18730:
-

 Summary: Use LLAP as execution engine for Druid mini Cluster Tests
 Key: HIVE-18730
 URL: https://issues.apache.org/jira/browse/HIVE-18730
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


Currently, we are using local MR to run Mini Cluster tests. It will be better 
to use LLAP cluster or TEZ. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18729) Druid Time column type

2018-02-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18729:
-

 Summary: Druid Time column type
 Key: HIVE-18729
 URL: https://issues.apache.org/jira/browse/HIVE-18729
 Project: Hive
  Issue Type: Task
  Components: Druid integration
Reporter: slim bouguerra
Assignee: Jesus Camacho Rodriguez


I have talked Offline with [~jcamachorodriguez] about this and agreed that the 
best way to go is to support both cases where Druid time column can be 
Timestamp or Timestamp with local time zone. 

In fact, for the Hive-Druid internal table, this makes perfect sense since we 
have Hive metadata about the time column during the CTAS statement then we can 
handle both cases as we do for another type of storage eg ORC.

For the Druid external tables, we can have a default type and allow the user to 
override that via table properties. 

CC [~ashutoshc] and [~nishantbangarwa]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18595) UNIX_TIMESTAMP UDF fails when type is Timestamp with local timezone

2018-01-31 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18595:
-

 Summary: UNIX_TIMESTAMP  UDF fails when type is Timestamp with 
local timezone
 Key: HIVE-18595
 URL: https://issues.apache.org/jira/browse/HIVE-18595
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


{code}

2018-01-31T12:59:45,464 ERROR [10e97c86-7f90-406b-a8fa-38be5d3529cc main] 
ql.Driver: FAILED: SemanticException [Error 10014]: Line 3:456 Wrong arguments 
''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only 
string/date/timestamp types
org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:456 Wrong arguments 
''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only 
string/date/timestamp types
 at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1394)
 at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 at 
org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76)
 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
 at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:235)
 at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:181)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11847)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11780)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genGBLogicalPlan(CalcitePlanner.java:3140)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4330)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1407)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1354)
 at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
 at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
 at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
 at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1159)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1175)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:422)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11393)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:304)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
 at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:163)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:639)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1504)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1632)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1382)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:240)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:343)
 at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1331)
 at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1305)
 at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173)
 at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
 at 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 

[jira] [Created] (HIVE-18594) DATEDIFF UDF fails when type is timestamp with Local timezone.

2018-01-31 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18594:
-

 Summary: DATEDIFF UDF fails when type is timestamp with Local 
timezone.
 Key: HIVE-18594
 URL: https://issues.apache.org/jira/browse/HIVE-18594
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: slim bouguerra


{code}

2018-01-31T12:45:08,488 ERROR [9b5c5020-b1f5-4703-8c2e-bac4aa01a578 main] 
ql.Driver: FAILED: SemanticException [Error 10014]: Line 3:88 Wrong arguments 
''2004-07-04'': DATEDIFF() o
nly takes STRING/TIMESTAMP/DATEWRITABLE types as 1-th argument, got 
TIMESTAMPLOCALTZ
org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:88 Wrong arguments 
''2004-07-04'': DATEDIFF() only takes STRING/TIMESTAMP/DATEWRITABLE types as 
1-th argument, got TIMESTA
MPLOCALTZ
 at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1394)
 at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 at 
org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76)
 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
 at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:235)
 at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:181)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11847)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11802)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSelectLogicalPlan(CalcitePlanner.java:4005)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4336)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1407)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1354)
 at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
 at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
 at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
 at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1159)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1175)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:422)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11393)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:304)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
 at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:163)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:639)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1504)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1632)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1382)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:240)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:343)
 at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1331)
 at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1305)
 at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173)
 at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
 at 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
 at 

  1   2   >