[jira] [Created] (HIVE-18573) Use proper Calcite operator instead of UDFs

2018-01-29 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18573:
-

 Summary: Use proper Calcite operator instead of UDFs
 Key: HIVE-18573
 URL: https://issues.apache.org/jira/browse/HIVE-18573
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: slim bouguerra


Currently, Hive is mostly using user-defined black box sql operators during 
Query planning. It will be more beneficial to use proper calcite operators.

Also, Use a single name for Extract operator instead of a different name for 
every Unit,  

Same for Floor function. This will allow unifying the treatment per operator.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18331) Renew the Kerberos ticket used by Druid Query runner

2017-12-22 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18331:
-

 Summary: Renew the Kerberos ticket used by Druid Query runner
 Key: HIVE-18331
 URL: https://issues.apache.org/jira/browse/HIVE-18331
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


Druid Http Client has to renew the current user Kerberos ticket when it is 
close to expire.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18254) Use proper AVG Calcite primitive instead of Other_FUNCTION

2017-12-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18254:
-

 Summary: Use proper AVG Calcite primitive instead of Other_FUNCTION
 Key: HIVE-18254
 URL: https://issues.apache.org/jira/browse/HIVE-18254
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


Currently Hive-Calcite operator tree treats AVG function as an unknown function 
that has a Calcite Sql Kind of Other_FUNCTION. This is an issue that can get 
into the way of rules like 
{{org.apache.calcite.rel.rules.AggregateReduceFunctionsRule}}.
This patch adds the avg function to the list of known aggregate function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18226) handle UDF to double/int over aggregate

2017-12-05 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18226:
-

 Summary: handle UDF to double/int over aggregate
 Key: HIVE-18226
 URL: https://issues.apache.org/jira/browse/HIVE-18226
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra


In cases like the following query Hive planner adds extra UDFtoDouble over 
integer columns.
This kind of udf can be pushed to Druid as DoubleSum instead of LongSum and 
vice versa.
{code}
PREHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
FROM druid_table GROUP BY floor_year(`__time`)
PREHOOK: type: QUERY
POSTHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
FROM druid_table GROUP BY floor_year(`__time`)
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: druid_table
properties:
  druid.query.json 
{"queryType":"timeseries","dataSource":"default.druid_table","descending":false,"granularity":"year","aggregations":[{"type":"longSum","name":"$f1","fieldName":"ctinyint"},{"type":"count","name":"$f2"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"],"context":{"skipEmptyBuckets":true}}
  druid.query.type timeseries
Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column 
stats: NONE
Select Operator
  expressions: __time (type: timestamp with local time zone), 
(UDFToDouble($f1) / UDFToDouble($f2)) (type: double)
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
table:
input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18197) Fix issue with wrong segments identifier usage.

2017-12-01 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18197:
-

 Summary: Fix issue with wrong segments identifier usage.
 Key: HIVE-18197
 URL: https://issues.apache.org/jira/browse/HIVE-18197
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


We have 2 different issues, that can make checking of load status fail for 
druid segments.
issues are due to usage of wrong segment identifier at couple of locations.

# We are construction the segment identifier with UTC timezone, which can be 
wrong if the segments we built in a different timezone. The way to fix this is 
to use the segment identifier instead of re-making it at the client side.
# We are using outdate segments identifiers for the INSERT INTO case. The way 
to fix this is to use the segment metadata produced by the metadata commit 
phase.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18196) Druid Mini Cluster to run Qtests integrations tests.

2017-12-01 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18196:
-

 Summary: Druid Mini Cluster to run Qtests integrations tests.
 Key: HIVE-18196
 URL: https://issues.apache.org/jira/browse/HIVE-18196
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: Ashutosh Chauhan


The overall Goal of this is to add a new Module that can fork a druid cluster 
to run integration testing as part of the Mini Clusters Qtest suite.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18156) Provide smooth migration path for CTAS when time column is not with timezone

2017-11-27 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18156:
-

 Summary: Provide smooth migration path for CTAS when time column 
is not with timezone 
 Key: HIVE-18156
 URL: https://issues.apache.org/jira/browse/HIVE-18156
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra
Assignee: Jesus Camacho Rodriguez


Currently, default recommend CTAS and most legacy documentation does not 
specify that __time column needs to be with timezone. Thus the CTAS will fail 
with 
{code} 
2017-11-27T17:13:10,241 ERROR [e5f708c8-df4e-41a4-b8a1-d18ac13123d2 main] 
ql.Driver: FAILED: SemanticException No column with timestamp with local 
time-zone type on query result; one column should be of timestamp with local 
time-zone type
org.apache.hadoop.hive.ql.parse.SemanticException: No column with timestamp 
with local time-zone type on query result; one column should be of timestamp 
with local time-zone type
at 
org.apache.hadoop.hive.ql.optimizer.SortedDynPartitionTimeGranularityOptimizer$SortedDynamicPartitionProc.getGranularitySelOp(SortedDynPartitionTimeGranularityOptimizer.java:242)
at 
org.apache.hadoop.hive.ql.optimizer.SortedDynPartitionTimeGranularityOptimizer$SortedDynamicPartitionProc.process(SortedDynPartitionTimeGranularityOptimizer.java:163)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.optimizer.SortedDynPartitionTimeGranularityOptimizer.transform(SortedDynPartitionTimeGranularityOptimizer.java:103)
at 
org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:250)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11683)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:298)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:592)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1589)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1356)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1346)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:342)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1300)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1274)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
at 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:92)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunne

[jira] [Created] (HIVE-17871) Add non nullability flag to druid time column

2017-10-20 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17871:
-

 Summary: Add non nullability flag to druid time column
 Key: HIVE-17871
 URL: https://issues.apache.org/jira/browse/HIVE-17871
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra


Druid time column is non null all the time.
Adding the non nullability flag will enable extra calcite goodness  like 
transforming 
{code} select count(`__time`) from table {code} to {code} select count(*) from 
table {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17653) Druid storage handler CTAS with boolean type columns fails.

2017-09-29 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17653:
-

 Summary: Druid storage handler CTAS with boolean type columns 
fails. 
 Key: HIVE-17653
 URL: https://issues.apache.org/jira/browse/HIVE-17653
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
 Fix For: 3.0.0


Druid storage handler CTAS fails with the exception below when a Boolean column 
is included.
A simple workaround would be to add a cast to string over the boolean column, 
this will lead to index the column as a druid dimension with value `true` or 
`false`.

{code}
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Reducer 3, 
vertexId=vertex_1506230948023_0005_9_02, diagnostics=[Task failed, 
taskId=task_1506230948023_0005_9_02_03, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt_1506230948023_0005_9_02_03_0:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 2)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing vector batch (tag=0) (vectorizedVertexNum 2)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:406)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:248)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:189)
... 15 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing vector batch (tag=0) (vectorizedVertexNum 2)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:492)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:397)
... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
Dimension bo does not have STRING type: BOOLEAN
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:564)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:664)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
at 
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:955)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:903)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:479)
... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: Dimension bo does not have STRING type: BOOLEAN
at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:272)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:609)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:55

[jira] [Created] (HIVE-17627) Use druid scan query instead of the select query.

2017-09-27 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17627:
-

 Summary: Use druid scan query instead of the select query.
 Key: HIVE-17627
 URL: https://issues.apache.org/jira/browse/HIVE-17627
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


The biggest difference between select query and scan query is that, scan query 
doesn't retain all rows in memory before rows can be returned to client.
It will cause memory pressure if too many rows required by select query.
Scan query doesn't have this issue.
Scan query can return all rows without issuing another pagination query, which 
is extremely useful when query against historical or realtime node directly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17623) Fix Select query Fix Double column serde and some refactoring

2017-09-27 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17623:
-

 Summary: Fix Select query Fix Double column serde and some 
refactoring
 Key: HIVE-17623
 URL: https://issues.apache.org/jira/browse/HIVE-17623
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 3.0.0
Reporter: slim bouguerra


This PR has 2 fixes.
First, fixes the limit of results returned by Select query that used to be 
limited to 16K rows
Second fixes the type inference for the double type newly added to druid.
Use Jackson polymorphism to infer types and parse results from druid nodes.
Removes duplicate codes form RecordReaders.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17582) Followup of HIVE-15708

2017-09-22 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17582:
-

 Summary: Followup of HIVE-15708
 Key: HIVE-17582
 URL: https://issues.apache.org/jira/browse/HIVE-17582
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 3.0.0
Reporter: slim bouguerra
Assignee: slim bouguerra


HIVE-15708 commit be59e024420ed5ca970e87a6dec402fecee21f06 
introduced some unwanted bugs
it did change the following code 
org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat#169
{code}
  builder.intervals(Arrays.asList(DruidTable.DEFAULT_INTERVAL));
{code}
with 
{code}
final List intervals = Arrays.asList();
builder.intervals(intervals);
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17581) Replace some calcite dependencies with native ones

2017-09-22 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17581:
-

 Summary: Replace some calcite dependencies with native ones
 Key: HIVE-17581
 URL: https://issues.apache.org/jira/browse/HIVE-17581
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Affects Versions: 3.0.0
Reporter: slim bouguerra
Assignee: slim bouguerra


This is a followup of HIVE-17468. This patch excludes some unwanted 
druid-calcite dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17523) Insert into druid table hangs Hive server2 in an infinit loop

2017-09-12 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17523:
-

 Summary: Insert into druid table  hangs Hive server2 in an infinit 
loop
 Key: HIVE-17523
 URL: https://issues.apache.org/jira/browse/HIVE-17523
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


Inserting data via insert into table backed by druid can lead to a Hive server 
hang.
This is due to some bug in the naming of druid segments partitions.
To reproduce the issue 
{code}
drop table login_hive;
create table login_hive(`timecolumn` timestamp, `userid` string, `num_l` 
double);
insert into login_hive values ('2015-01-01 00:00:00', 'user1', 5);
insert into login_hive values ('2015-01-01 01:00:00', 'user2', 4);
insert into login_hive values ('2015-01-01 02:00:00', 'user3', 2);

insert into login_hive values ('2015-01-02 00:00:00', 'user1', 1);
insert into login_hive values ('2015-01-02 01:00:00', 'user2', 2);
insert into login_hive values ('2015-01-02 02:00:00', 'user3', 8);

insert into login_hive values ('2015-01-03 00:00:00', 'user1', 5);
insert into login_hive values ('2015-01-03 01:00:00', 'user2', 9);
insert into login_hive values ('2015-01-03 04:00:00', 'user3', 2);

insert into login_hive values ('2015-03-09 00:00:00', 'user3', 5);
insert into login_hive values ('2015-03-09 01:00:00', 'user1', 0);
insert into login_hive values ('2015-03-09 05:00:00', 'user2', 0);


drop table login_druid;
CREATE TABLE login_druid
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "druid_login_test_tmp", 
"druid.segment.granularity" = "DAY", "druid.query.granularity" = "HOUR")
AS
select `timecolumn` as `__time`, `userid`, `num_l` FROM login_hive;
select * FROM login_druid;

insert into login_druid values ('2015-03-09 05:00:00', 'user4', 0); 
{code}

This patch unifies the logic of pushing and segments naming by using Druid data 
segment pusher as much as possible.
This patch also has some minor code refactoring and test enhancements.
 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17468:
-

 Summary: Shade and package appropriate jackson version for druid 
storage handler
 Key: HIVE-17468
 URL: https://issues.apache.org/jira/browse/HIVE-17468
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
 Fix For: 3.0.0


Currently we are excluding all the jackson core dependencies coming from druid. 
This is wrong in my opinion since this will lead to the packaging of unwanted 
jackson library from other projects.
As you can see the file hive-druid-deps.txt currently jacskon core is coming 
from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
by druid. This patch exclude the unwanted jars and make sure to bring in druid 
jackson dependency from druid it self.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17372) update druid dependency to druid 0.10.1

2017-08-22 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17372:
-

 Summary: update druid dependency to druid 0.10.1
 Key: HIVE-17372
 URL: https://issues.apache.org/jira/browse/HIVE-17372
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


Update to most recent druid version to be released August 23.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17303) Missmatch between roaring bitmap library used by druid and the one coming from tez

2017-08-11 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17303:
-

 Summary: Missmatch between roaring bitmap library used by druid 
and the one coming from tez
 Key: HIVE-17303
 URL: https://issues.apache.org/jira/browse/HIVE-17303
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


{code} 


 
Caused by: java.util.concurrent.ExecutionException: 
java.lang.NoSuchMethodError: 
org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
  at 
org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
  at 
org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
  at 
org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
  at 
org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:165)
  ... 25 more
Caused by: java.lang.NoSuchMethodError: 
org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
  at 
org.apache.hive.druid.com.metamx.collections.bitmap.WrappedRoaringBitmap.toImmutableBitmap(WrappedRoaringBitmap.java:65)
  at 
org.apache.hive.druid.com.metamx.collections.bitmap.RoaringBitmapFactory.makeImmutableBitmap(RoaringBitmapFactory.java:88)
  at 
org.apache.hive.druid.io.druid.segment.StringDimensionMergerV9.writeIndexes(StringDimensionMergerV9.java:348)
  at 
org.apache.hive.druid.io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:218)
  at 
org.apache.hive.druid.io.druid.segment.IndexMerger.merge(IndexMerger.java:438)
  at 
org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:186)
  at 
org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:152)
  at 
org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:996)
  at 
org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:93)
  at 
org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:385)
  at 
org.apache.hive.druid.io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44)
  ... 4 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
killedTasks:89, Vertex vertex_1502470020457_0005_12_05 [Reducer 2] 
killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to 
VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)

Options

Attachments
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message

2017-08-11 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17302:
-

 Summary: ReduceRecordSource should not add batch string to 
Exception message
 Key: HIVE-17302
 URL: https://issues.apache.org/jira/browse/HIVE-17302
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


ReduceRecordSource is adding the batch data as a string to the exception stack, 
this can lead to an OOM of the Query AM when the query fails due to other issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-07-24 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17160:
-

 Summary: Adding kerberos Authorization to the Druid hive 
integration
 Key: HIVE-17160
 URL: https://issues.apache.org/jira/browse/HIVE-17160
 Project: Hive
  Issue Type: New Feature
  Components: Druid integration
Reporter: slim bouguerra


This goal of this feature is to allow hive querying a secured druid cluster 
using kerberos credentials.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16816) Chained Group by support for druid.

2017-06-02 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16816:
-

 Summary: Chained Group by support for druid.
 Key: HIVE-16816
 URL: https://issues.apache.org/jira/browse/HIVE-16816
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra


This is more likely to be a calcite enhancement but am logging it here to track 
it any way.
Currently queries like {code} select count (distinct dim) from table {code} is 
pushed partially to druid as group by dim followed by a count executed by hive 
QE. This can be enhanced by using the nested (eg chained execution) group by 
query such as the first (inner) GB query does group by key and the second 
(outer) does the  count. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16588) Ressource leak by druid http client

2017-05-04 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16588:
-

 Summary: Ressource leak by druid http client
 Key: HIVE-16588
 URL: https://issues.apache.org/jira/browse/HIVE-16588
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
 Fix For: 3.0.0


Current implementation of druid storage handler does leak some resources if the 
creation of the http client fails due to too many files exception.
The reason this is leaking is the fact the cleaning hook is registered after 
the client starts.
In order to fix this will extract the creation of the HTTP client to become 
static and reusable instead of per query creation.
 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16522) Hive is query timer is not keeping track of the fetch task execution

2017-04-24 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16522:
-

 Summary: Hive is query timer is not keeping track of the fetch 
task execution
 Key: HIVE-16522
 URL: https://issues.apache.org/jira/browse/HIVE-16522
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently Hive CLI query execution time does not include fetch time execution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16519) Fix exception thrown by checkOutputSpecs

2017-04-24 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16519:
-

 Summary: Fix exception thrown by checkOutputSpecs
 Key: HIVE-16519
 URL: https://issues.apache.org/jira/browse/HIVE-16519
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


do not throw exception by checkOutputSpecs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16482) Druid Ser/Desr need to use dimension output name in order to function with Extraction function

2017-04-19 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16482:
-

 Summary: Druid Ser/Desr need to use dimension output name in order 
to function with Extraction function
 Key: HIVE-16482
 URL: https://issues.apache.org/jira/browse/HIVE-16482
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


Druid Ser/Desr need to use dimension output name in order to function with 
Extraction function.
Some part of the Ser/Desr code uses the method {code} 
DimensionSpec.getDimension(){code} although when extraction function are in 
game the name of the dimension will be defined by 
{code}DimensionSpec.getOutputName() {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16404) Renaming of public classes in Calcite 12 breeaking druid integration

2017-04-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16404:
-

 Summary: Renaming of public classes in Calcite 12 breeaking druid 
integration
 Key: HIVE-16404
 URL: https://issues.apache.org/jira/browse/HIVE-16404
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: slim bouguerra
 Fix For: 3.0.0


Changes to names in the druid rules is backward incompatible with current 
implementation.
https://github.com/apache/calcite/commit/a89c62cd6d6cc181c90881afa0bf099746739a91



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16371) Add bitmap selection strategy for druid storage handler

2017-04-04 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16371:
-

 Summary: Add bitmap selection strategy for druid storage handler
 Key: HIVE-16371
 URL: https://issues.apache.org/jira/browse/HIVE-16371
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently only Concise Bitmap strategy is supported.
This Pr is to make Roaring bitmap encoding the default and Concise optional if 
needed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16210) Use jvm temporary tmp dir by default

2017-03-14 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16210:
-

 Summary: Use jvm temporary tmp dir by default
 Key: HIVE-16210
 URL: https://issues.apache.org/jira/browse/HIVE-16210
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


instead of using "/tmp" by default, it makes more sense to use the jvm default 
tmp dir. This can have dramatic consequences if the indexed files are huge. For 
instance application run by run containers can be provisioned with a dedicated 
tmp dir. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16149) Druid query path fails when using LLAP mode

2017-03-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16149:
-

 Summary: Druid query path fails when using LLAP mode
 Key: HIVE-16149
 URL: https://issues.apache.org/jira/browse/HIVE-16149
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra
Assignee: Ashutosh Chauhan


{code}
hive> select i_item_desc ,i_category ,i_class ,i_current_price ,i_item_id 
,sum(ss_ext_sales_price)
> as itemrevenue ,sum(ss_ext_sales_price)*100/sum(sum(ss_ext_sales_price)) 
over (partition by i_class) as revenueratio
> from tpcds_store_sales_sold_time_1000_day_all
> where  (i_category ='Jewelry' or  i_category = 'Sports' or i_category 
='Books') and `__time` >= cast('2001-01-12' as date) and `__time` <= 
cast('2001-02-11' as date)
> group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price 
order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 10;
Query ID = sbouguerra_20170308131436_225330b7-1142-4e4e-a05a-46ef544c8ee8
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id 
application_1488231257387_1862)

--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 llapINITED  1  001
   0   0
Reducer 2 llapINITED  2  002
   0   0
Reducer 3 llapINITED  1  001
   0   0
--
VERTICES: 00/03  [>>--] 0%ELAPSED TIME: 59.68 s
--
Status: Failed
Dag received [DAG_TERMINATE, SERVICE_PLUGIN_ERROR] in RUNNING state.
Error reported by TaskScheduler [[2:LLAP]][SERVICE_UNAVAILABLE] No LLAP Daemons 
are running
Vertex killed, vertexName=Reducer 3, vertexId=vertex_1488231257387_1862_3_02, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not 
succeed due to DAG_TERMINATED, failedTasks:0 killedTasks:1, Vertex 
vertex_1488231257387_1862_3_02 [Reducer 3] killed/failed due to:DAG_TERMINATED]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1488231257387_1862_3_01, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not 
succeed due to DAG_TERMINATED, failedTasks:0 killedTasks:2, Vertex 
vertex_1488231257387_1862_3_01 [Reducer 2] killed/failed due to:DAG_TERMINATED]
Vertex killed, vertexName=Map 1, vertexId=vertex_1488231257387_1862_3_00, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not 
succeed due to DAG_TERMINATED, failedTasks:0 killedTasks:1, Vertex 
vertex_1488231257387_1862_3_00 [Map 1] killed/failed due to:DAG_TERMINATED]
DAG did not succeed due to SERVICE_PLUGIN_ERROR. failedVertices:0 
killedVertices:3
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Dag received [DAG_TERMINATE, 
SERVICE_PLUGIN_ERROR] in RUNNING state.Error reported by TaskScheduler 
[[2:LLAP]][SERVICE_UNAVAILABLE] No LLAP Daemons are runningVertex killed, 
vertexName=Reducer 3, vertexId=vertex_1488231257387_1862_3_02, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not 
succeed due to DAG_TERMINATED, failedTasks:0 killedTasks:1, Vertex 
vertex_1488231257387_1862_3_02 [Reducer 3] killed/failed due 
to:DAG_TERMINATED]Vertex killed, vertexName=Reducer 2, 
vertexId=vertex_1488231257387_1862_3_01, diagnostics=[Vertex received Kill 
while in RUNNING state., Vertex did not succeed due to DAG_TERMINATED, 
failedTasks:0 killedTasks:2, Vertex vertex_1488231257387_1862_3_01 [Reducer 2] 
killed/failed due to:DAG_TERMINATED]Vertex killed, vertexName=Map 1, 
vertexId=vertex_1488231257387_1862_3_00, diagnostics=[Vertex received Kill 
while in RUNNING state., Vertex did not succeed due to DAG_TERMINATED, 
failedTasks:0 killedTasks:1, Vertex vertex_1488231257387_1862_3_00 [Map 1] 
killed/failed due to:DAG_TERMINATED]DAG did not succeed due to 
SERVICE_PLUGIN_ERROR. failedVertices:0 killedVertices:3
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16126) push all the time extraction to druid

2017-03-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16126:
-

 Summary: push all the time extraction to druid
 Key: HIVE-16126
 URL: https://issues.apache.org/jira/browse/HIVE-16126
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


currently we don't push most of the time extractions to druid which leads to 
selecting all the data, bad!.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16125) Split work between reducers.

2017-03-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16125:
-

 Summary: Split work between reducers.
 Key: HIVE-16125
 URL: https://issues.apache.org/jira/browse/HIVE-16125
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


Split work between reducer.
currently we have one reducer per segment granularity even if the interval will 
be partitioned over multiple partitions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16124) Drop the segments data as soon it is pushed to HDFS

2017-03-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16124:
-

 Summary: Drop the segments data as soon it is pushed to HDFS
 Key: HIVE-16124
 URL: https://issues.apache.org/jira/browse/HIVE-16124
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


Drop the pushed segments from the indexer as soon as the HDFS push is done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16123) Let user chose the granularity of bucketing.

2017-03-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16123:
-

 Summary: Let user chose the granularity of bucketing.
 Key: HIVE-16123
 URL: https://issues.apache.org/jira/browse/HIVE-16123
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


Currently we index the data with granularity of none which puts lot of pressure 
on the indexer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16122) NPE Hive Druid split introduced by HIVE-15928

2017-03-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16122:
-

 Summary: NPE Hive Druid split introduced by HIVE-15928
 Key: HIVE-16122
 URL: https://issues.apache.org/jira/browse/HIVE-16122
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16096) Predicate `__time` In ("date", "date") is not pused

2017-03-02 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16096:
-

 Summary: Predicate `__time` In ("date", "date") is not pused
 Key: HIVE-16096
 URL: https://issues.apache.org/jira/browse/HIVE-16096
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


{code}
 explain select * from login_druid where `__time` in ("2003-1-1", "2004-1-1" );
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
limit:-1
Select Operator [SEL_2]
  Output:["_col0","_col1","_col2"]
  Filter Operator [FIL_4]
predicate:(__time) IN ('2003-1-1', '2004-1-1')
TableScan [TS_0]
  
Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}

{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16095) Filter generation is not taking into account the column type.

2017-03-02 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16095:
-

 Summary: Filter generation is not taking into account the column 
type.
 Key: HIVE-16095
 URL: https://issues.apache.org/jira/browse/HIVE-16095
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


We are suppose to get alphanumeric comparison when we have a cast to numeric 
type. This looks like to be a calcite issue.  
{code}
hive> explain select * from login_druid where userid < 2
> ;
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
limit:-1
Select Operator [SEL_1]
  Output:["_col0","_col1","_col2"]
  TableScan [TS_0]

Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"filter\":{\"type\":\"bound\",\"dimension\":\"userid\",\"upper\":\"2\",\"upperStrict\":true,\"alphaNumeric\":false},\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}

Time taken: 1.548 seconds, Fetched: 10 row(s)
hive> explain select * from login_druid where cast (userid as int) < 2;
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
limit:-1
Select Operator [SEL_1]
  Output:["_col0","_col1","_col2"]
  TableScan [TS_0]

Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"filter\":{\"type\":\"bound\",\"dimension\":\"userid\",\"upper\":\"2\",\"upperStrict\":true,\"alphaNumeric\":false},\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}

Time taken: 0.27 seconds, Fetched: 10 row(s)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16026) Generated query will timeout and/or kill the druid cluster.

2017-02-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16026:
-

 Summary: Generated query will timeout and/or kill the druid 
cluster.
 Key: HIVE-16026
 URL: https://issues.apache.org/jira/browse/HIVE-16026
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


Grouping by `__time` and another dimension generate a query with granularity 
NONE with an interval from 1970 to 3000. This will kill the druid cluster 
because druid group by strategy will create cursor for every ms and there is 
lot of milliseconds between 1970 and 3000. Hence such query can turn into a 
select then do the group by within hive. This should only happen when we don't 
know the `__time` granularity.
{code}
explain select `__time`, userid from login_druid group by `__time`, userid
> ;
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
limit:-1
Select Operator [SEL_1]
  Output:["_col0","_col1"]
  TableScan [TS_0]

Output:["__time","userid"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_user_login\",\"granularity\":\"NONE\",\"dimensions\":[\"userid\"],\"limitSpec\":{\"type\":\"default\"},\"aggregations\":[{\"type\":\"longSum\",\"name\":\"dummy_agg\",\"fieldName\":\"dummy_agg\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"}
{code}  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16025) Where IN clause throws exception

2017-02-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16025:
-

 Summary: Where IN clause throws exception
 Key: HIVE-16025
 URL: https://issues.apache.org/jira/browse/HIVE-16025
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Priority: Critical


{code}
select * from login_druid where userid IN ("user1", "user2");
Exception in thread "main" java.lang.AssertionError: cannot translate filter: 
IN($1, _UTF-16LE'user1', _UTF-16LE'user2')
at 
org.apache.calcite.adapter.druid.DruidQuery$Translator.translateFilter(DruidQuery.java:886)
at 
org.apache.calcite.adapter.druid.DruidQuery$Translator.access$000(DruidQuery.java:786)
at 
org.apache.calcite.adapter.druid.DruidQuery.getQuery(DruidQuery.java:424)
at 
org.apache.calcite.adapter.druid.DruidQuery.deriveQuerySpec(DruidQuery.java:402)
at 
org.apache.calcite.adapter.druid.DruidQuery.getQuerySpec(DruidQuery.java:351)
at 
org.apache.calcite.adapter.druid.DruidQuery.deriveRowType(DruidQuery.java:271)
at 
org.apache.calcite.rel.AbstractRelNode.getRowType(AbstractRelNode.java:219)
at 
org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:343)
at 
org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
at 
org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:225)
at 
org.apache.calcite.adapter.druid.DruidRules$DruidFilterRule.onMatch(DruidRules.java:142)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:314)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:502)
at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:381)
at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:247)
at 
org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:125)
at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:206)
at 
org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:193)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:1775)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1504)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1260)
at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:113)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:997)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1068)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1084)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:363)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11026)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:285)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:511)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15951) Make sure base persist directory is unique and deleted

2017-02-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15951:
-

 Summary: Make sure base persist directory is unique and deleted
 Key: HIVE-15951
 URL: https://issues.apache.org/jira/browse/HIVE-15951
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: slim bouguerra
Priority: Critical
 Fix For: 2.2.0


In some cases the base persist directory will contain old data or shared 
between reducer in the same physical VM.
That will lead to the failure of the job till that the directory is cleaned.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15877) Upload dependency jars for druid storage handler

2017-02-10 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15877:
-

 Summary: Upload dependency jars for druid storage handler
 Key: HIVE-15877
 URL: https://issues.apache.org/jira/browse/HIVE-15877
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


Upload dependency jars for druid storage handler



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15809) Typo in the PostgreSQL database name for druid service

2017-02-03 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15809:
-

 Summary: Typo in the PostgreSQL database name for druid service
 Key: HIVE-15809
 URL: https://issues.apache.org/jira/browse/HIVE-15809
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: slim bouguerra
Assignee: slim bouguerra
Priority: Trivial
 Fix For: 2.2.0






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15785) Add S3 support for druid storage handler

2017-02-01 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15785:
-

 Summary: Add S3 support for druid storage handler
 Key: HIVE-15785
 URL: https://issues.apache.org/jira/browse/HIVE-15785
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra
 Fix For: 2.2.0


Add S3 support for druid storage handler



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15727) Add pre insert work to give storage handler the possibility to perform pre insert checking

2017-01-25 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15727:
-

 Summary: Add pre insert work to give storage handler the 
possibility to perform pre insert checking
 Key: HIVE-15727
 URL: https://issues.apache.org/jira/browse/HIVE-15727
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 2.2.0


Add pre insert work stage to give storage handler the possibility to perform 
pre insert checking. For instance for the druid storage handler this will block 
the statement INSERT INTO statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15586) Make Insert and Create statement Transactional

2017-01-11 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15586:
-

 Summary: Make Insert and Create statement Transactional
 Key: HIVE-15586
 URL: https://issues.apache.org/jira/browse/HIVE-15586
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently insert/create will return the handle to user without waiting for the 
data been loaded by the druid cluster. In order to avoid that will add a 
passive wait till the segment are loaded by historical in case the coordinator 
is UP.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15571) Support Insert into for druid storage handler

2017-01-10 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15571:
-

 Summary: Support Insert into for druid storage handler
 Key: HIVE-15571
 URL: https://issues.apache.org/jira/browse/HIVE-15571
 Project: Hive
  Issue Type: New Feature
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15439) Support INSERT OVERWRITE for internal druid datasources.

2016-12-15 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15439:
-

 Summary: Support INSERT OVERWRITE for internal druid datasources.
 Key: HIVE-15439
 URL: https://issues.apache.org/jira/browse/HIVE-15439
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: slim bouguerra
Assignee: slim bouguerra


Add support for SQL statement INSERT OVERWRITE TABLE druid_internal_table.
In order to add this support will need to add new post insert hook to update 
the druid metadata. Creation of the segment will be the same as CTAS.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15393) Update Guava version

2016-12-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15393:
-

 Summary: Update Guava version
 Key: HIVE-15393
 URL: https://issues.apache.org/jira/browse/HIVE-15393
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: slim bouguerra
Priority: Blocker


Druid base code is using newer version of guava 16.0.1 that is not compatible 
with the current version used by Hive.
FYI Hadoop project is moving to Guava 18 not sure if it is better to move to 
guava 18 or even 19.
https://issues.apache.org/jira/browse/HADOOP-10101



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-11-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15277:
-

 Summary: Teach Hive how to create/delete Druid segments 
 Key: HIVE-15277
 URL: https://issues.apache.org/jira/browse/HIVE-15277
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: slim bouguerra
Assignee: slim bouguerra




We want to extend the DruidStorageHandler to support CTAS queries.
In this implementation Hive will generate druid segment files and insert the 
metadata to signal the handoff to druid.

The syntax will be as follows:

CREATE TABLE druid_table_1
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "datasourcename")
AS ;

This statement stores the results of query  in a Druid datasource 
named 'datasourcename'. One of the columns of the query needs to be the time 
dimension, which is mandatory in Druid. In particular, we use the same 
convention that it is used for Druid: there needs to be a the column named 
'__time' in the result of the executed query, which will act as the time 
dimension column in Druid. Currently, the time column dimension needs to be a 
'timestamp' type column.
metrics can be of type long, double and float while dimensions are strings. 
Keep in mind that druid has a clear separation between dimensions and metrics, 
therefore if you have a column in hive that contains number and need to be 
presented as dimension use the cast operator to cast as string. 
This initial implementation interacts with Druid Meta data storage to 
add/remove the table in druid, user need to supply the meta data config as 
--hiveconf hive.druid.metadata.password=XXX --hiveconf 
hive.druid.metadata.username=druid --hiveconf 
hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15274) wrong results on the column __time

2016-11-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15274:
-

 Summary: wrong results on the column __time
 Key: HIVE-15274
 URL: https://issues.apache.org/jira/browse/HIVE-15274
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: Jesus Camacho Rodriguez
Priority: Minor


issuing select * from table will return wrong time column.
expected results
 ─┬┬─┐
│ __time  │ dimension1 │ metric1 │
├─┼┼─┤
│ Wed Dec 31 2014 16:00:00 GMT-0800 (PST) │ value1 │ 1   │
│ Wed Dec 31 2014 16:00:00 GMT-0800 (PST) │ value1.1   │ 1   │
│ Sun May 31 2015 19:00:00 GMT-0700 (PDT) │ value2 │ 20.5│
│ Sun May 31 2015 19:00:00 GMT-0700 (PDT) │ value2.1   │ 32  │
└─┴┴─┘

returned result

2014-12-31 19:00:00 value1  1.0
2014-12-31 19:00:00 value1.11.0
2014-12-31 19:00:00 value2  20.5
2014-12-31 19:00:00 value2.132.0





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15273) Http Client not configured correctly

2016-11-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15273:
-

 Summary: Http Client not configured correctly
 Key: HIVE-15273
 URL: https://issues.apache.org/jira/browse/HIVE-15273
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
Priority: Minor


Current used http client by the druid-hive record reader is constructed with 
default values. Default values of numConnection and ReadTimeout are very small 
which can lead to following exception " ERROR 
[2ee34a2b-c8a5-4748-ab91-db3621d2aa5c main] CliDriver: Failed with exception 
java.io.IOException:java.io.IOException: java.io.IOException: org.apache.h
ive.druid.org.jboss.netty.channel.ChannelException: Channel disconnected"
Full stack can be found 
here.https://gist.github.com/b-slim/384ca6a96698f5b51ad9b171cff556a2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2