[ 
https://issues.apache.org/jira/browse/HIVE-19155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437974#comment-16437974
 ] 

Ashutosh Chauhan commented on HIVE-19155:
-----------------------------------------

Change to normalize segment interval to utc looks correct to me since at query 
time we send query from Hive to Druid in UTC. However, you also had a change to 
compute bucket start time from timestamp, which was earlier from granularity 
column. Whats this granularity column contains? If it just has truncated 
timestamp, do we need this column ever? We can always compute it from timestamp 
column.

> Day time saving cause Druid inserts to fail with 
> org.apache.hive.druid.io.druid.java.util.common.UOE: Cannot add overlapping 
> segments
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-19155
>                 URL: https://issues.apache.org/jira/browse/HIVE-19155
>             Project: Hive
>          Issue Type: Bug
>          Components: Druid integration
>            Reporter: slim bouguerra
>            Assignee: slim bouguerra
>            Priority: Major
>         Attachments: HIVE-19155.patch
>
>
> If you try to insert data around the daylight saving time hour the query 
> fails with following exception
> {code}
> 2018-04-10T11:24:58,836 ERROR [065fdaa2-85f9-4e49-adaf-3dc14d51be90 main] 
> exec.DDLTask: Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hive.druid.io.druid.java.util.common.UOE: Cannot add overlapping 
> segments [2015-03-08T05:00:00.000Z/2015-03-09T05:00:00.000Z and 
> 2015-03-09T04:00:00.000Z/2015-03-10T04:00:00.000Z] with the same version 
> [2018-04-10T11:24:48.388-07:00]
>         at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:914) 
> ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:919) 
> ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4831) 
> [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:394) 
> [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) 
> [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) 
> [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2443) 
> [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2114) 
> [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1797) 
> [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1538) 
> [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1532) 
> [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) 
> [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:204) 
> [hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) 
> [hive-cli-3.1.0-SNAPSHOT.jar:?]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) 
> [hive-cli-3.1.0-SNAPSHOT.jar:?]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) 
> [hive-cli-3.1.0-SNAPSHOT.jar:?]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) 
> [hive-cli-3.1.0-SNAPSHOT.jar:?]
>         at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1455)
>  [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1429) 
> [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:177)
>  [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) 
> [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59)
>  [test-classes/:?]
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_92]
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_92]
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_92]
> {code}
> You can reproduce this using the following DDL 
> {code}
> create database druid_test;
> use druid_test;
> create table test_table(`timecolumn` timestamp, `userid` string, `num_l` 
> float);
> insert into test_table values ('2015-03-08 00:00:00', 'i1-start', 4);
> insert into test_table values ('2015-03-08 23:59:59', 'i1-end', 1);
> insert into test_table values ('2015-03-09 00:00:00', 'i2-start', 4);
> insert into test_table values ('2015-03-09 23:59:59', 'i2-end', 1);
> insert into test_table values ('2015-03-10 00:00:00', 'i3-start', 2);
> insert into test_table values ('2015-03-10 23:59:59', 'i3-end', 2);
> CREATE TABLE druid_table
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.segment.granularity" = "DAY")
> AS
> select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
> `userid`, `num_l` FROM test_table;
> {code}
> The fix is to always adjust the Druid segments identifiers to UTC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to