Chun Chang created DRILL-1507:
---------------------------------
Summary: Potential hash insert issue
Key: DRILL-1507
URL: https://issues.apache.org/jira/browse/DRILL-1507
Project: Apache Drill
Issue Type: Bug
Components: Functions - Drill
Affects Versions: 0.6.0
Reporter: Chun Chang
#Thu Oct 02 17:49:48 PDT 2014
git.commit.id.abbrev=29dde76
Running the following "case, group by, and order by" query against json file
type, I saw the following hash insert errors repeatedly. The query finishes
eventually after a little over 30 min, and the data returned is correct. The
same query running against parquet file finishes in about a minute. Here is the
query:
/root/drillATS/incubator-drill/testing/framework/resources/aggregate1/json/testcases/aggregate26.q
:
select cast(case when ss_sold_date_sk is null then 0 else ss_sold_date_sk end
as int) as soldd, cast(case when ss_sold_time_sk is null then 0 else
ss_sold_time_sk end as bigint) as soldt, cast(case when ss_item_sk is null then
0.0 else ss_item_sk end as float) as itemsk, cast(case when ss_customer_sk is
null then 0.0 else ss_customer_sk end as decimal(18,9)) as custsk, cast(case
when ss_cdemo_sk is null then 0 else ss_cdemo_sk end as varchar(20)) as cdemo,
ss_hdemo_sk as hdemo, ss_addr_sk as addrsk, ss_store_sk as storesk, ss_promo_sk
as promo, ss_ticket_number as tickn, sum(ss_quantity) as quantities from
store_sales group by cast(case when ss_sold_date_sk is null then 0 else
ss_sold_date_sk end as int), cast(case when ss_sold_time_sk is null then 0 else
ss_sold_time_sk end as bigint), cast(case when ss_item_sk is null then 0.0 else
ss_item_sk end as float), cast(case when ss_customer_sk is null then 0.0 else
ss_customer_sk end as decimal(18,9)), cast(case when ss_cdemo_sk is null then 0
else ss_cdemo_sk end as varchar(20)), ss_hdemo_sk, ss_addr_sk, ss_store_sk,
ss_promo_sk, ss_ticket_number order by cast(case when ss_sold_date_sk is null
then 0 else ss_sold_date_sk end as int), cast(case when ss_sold_time_sk is null
then 0 else ss_sold_time_sk end as bigint), cast(case when ss_item_sk is null
then 0.0 else ss_item_sk end as float), cast(case when ss_customer_sk is null
then 0.0 else ss_customer_sk end as decimal(18,9)), cast(case when ss_cdemo_sk
is null then 0 else ss_cdemo_sk end as varchar(20)), ss_hdemo_sk, ss_addr_sk,
ss_store_sk, ss_promo_sk, ss_ticket_number limit 100
Here is the error I saw:
11:46:46.836 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0 - Producer Thread]
DEBUG o.apache.drill.exec.memory.Accountor - Fragment:0:0 Reserved 32768 bytes.
Total Allocated: 778240
11:46:46.848 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0] DEBUG
o.a.d.e.p.impl.common.HashTable - Put into hash table failed .. Retrying with
new batch holder...
.....
11:48:49.936 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0 - Producer Thread]
DEBUG o.apache.drill.exec.memory.Accountor - Fragment:0:0 Reserved 32768 bytes.
Total Allocated: 778240
11:48:49.947 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0] DEBUG
o.a.d.e.p.impl.common.HashTable - Put into hash table failed .. Retrying with
new batch holder...
The data is tpcds and converted into json using drill's json writer. Since
eventually the query completes and passes data verification, the json writer is
probably converting parquet to json correctly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)