A question about a potential bug in Druid Joins

Jason Chen Thu, 24 Jun 2021 13:27:26 -0700

Hello, Druid community,

Ben Krug from Imply points me to this mail list for my question about Druid 
Joins. We have a following Druid Join query that may trigger a bug in Druid:
> quote_type
> WITH DIM AS (
>   SELECT api_client_id, title
>   FROM inline_dimension_api_clients_1 AS API_CLIENTS
> ),
> FACTS AS (
>   SELECT api_client_id, COUNT(*) as api_client_count
>   FROM inline_data AS ORDERS
>   WHERE ORDERS.__time >= TIMESTAMP '2021-06-10 00:00:00' AND ORDERS.__time < 
> TIMESTAMP '2021-06-18 00:00:00' AND ORDERS.shop_id = 25248974
>   GROUP BY 1
> )
> SELECT DIM.title, FACTS.api_client_id, FACTS.api_client_count
> FROM FACTS
> LEFT JOIN DIM ON FACTS.api_client_id = DIM.api_client_id


So the “api_client_id” field is `long` type in both “inline_data” and 
“inline_dimension_api_clients_1” datasources. However, when doing a join, the 
makeLongProcessor method will be called, and throw an 
“UnsupportedOperationException" because "index.keyType()" is string in MapIndex.

Then I found Gian Merlino has a PR to fix the issue. I have validated that this 
fix works for our case in my local Druid cluster. The fix is not included in 
Druid v0.21.1.

I have the following questions:

1. Why the index key type is `string` rather than `long` for my subquery? Is it 
implicitly transformed to `string` type for performance benefit?
2. When will you publish a new Druid release? Will the fix be part of the next 
release?


Thank you
Jason Chen



Jason (Jianbin) Chen
Senior Data Developer
p: +1 2066608351 | e: jason.c...@shopify.com
a: 234 Laurier Ave W Ottawa, ON K1N 5X8

A question about a potential bug in Druid Joins

Reply via email to