slim bouguerra created HIVE-16026: ------------------------------------- Summary: Generated query will timeout and/or kill the druid cluster. Key: HIVE-16026 URL: https://issues.apache.org/jira/browse/HIVE-16026 Project: Hive Issue Type: Bug Components: Druid integration Reporter: slim bouguerra
Grouping by `__time` and another dimension generate a query with granularity NONE with an interval from 1970 to 3000. This will kill the druid cluster because druid group by strategy will create cursor for every ms and there is lot of milliseconds between 1970 and 3000. Hence such query can turn into a select then do the group by within hive. This should only happen when we don't know the `__time` granularity. {code} explain select `__time`, userid from login_druid group by `__time`, userid > ; OK Plan optimized by CBO. Stage-0 Fetch Operator limit:-1 Select Operator [SEL_1] Output:["_col0","_col1"] TableScan [TS_0] Output:["__time","userid"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_user_login\",\"granularity\":\"NONE\",\"dimensions\":[\"userid\"],\"limitSpec\":{\"type\":\"default\"},\"aggregations\":[{\"type\":\"longSum\",\"name\":\"dummy_agg\",\"fieldName\":\"dummy_agg\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"} {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)