Chun Chang created DRILL-1507:
---------------------------------

             Summary: Potential hash insert issue 
                 Key: DRILL-1507
                 URL: https://issues.apache.org/jira/browse/DRILL-1507
             Project: Apache Drill
          Issue Type: Bug
          Components: Functions - Drill
    Affects Versions: 0.6.0
            Reporter: Chun Chang


#Thu Oct 02 17:49:48 PDT 2014
git.commit.id.abbrev=29dde76

Running the following "case, group by, and order by" query against json file 
type, I saw the following hash insert errors repeatedly. The query finishes 
eventually after a little over 30 min, and the data returned is correct. The 
same query running against parquet file finishes in about a minute. Here is the 
query:

/root/drillATS/incubator-drill/testing/framework/resources/aggregate1/json/testcases/aggregate26.q
 :
select cast(case when ss_sold_date_sk is null then 0 else ss_sold_date_sk end 
as int) as soldd, cast(case when ss_sold_time_sk is null then 0 else 
ss_sold_time_sk end as bigint) as soldt, cast(case when ss_item_sk is null then 
0.0 else ss_item_sk end as float) as itemsk, cast(case when ss_customer_sk is 
null then 0.0 else ss_customer_sk end as decimal(18,9)) as custsk, cast(case 
when ss_cdemo_sk is null then 0 else ss_cdemo_sk end as varchar(20)) as cdemo, 
ss_hdemo_sk as hdemo, ss_addr_sk as addrsk, ss_store_sk as storesk, ss_promo_sk 
as promo, ss_ticket_number as tickn, sum(ss_quantity) as quantities from 
store_sales group by cast(case when ss_sold_date_sk is null then 0 else 
ss_sold_date_sk end as int), cast(case when ss_sold_time_sk is null then 0 else 
ss_sold_time_sk end as bigint), cast(case when ss_item_sk is null then 0.0 else 
ss_item_sk end as float), cast(case when ss_customer_sk is null then 0.0 else 
ss_customer_sk end as decimal(18,9)), cast(case when ss_cdemo_sk is null then 0 
else ss_cdemo_sk end as varchar(20)), ss_hdemo_sk, ss_addr_sk, ss_store_sk, 
ss_promo_sk, ss_ticket_number order by cast(case when ss_sold_date_sk is null 
then 0 else ss_sold_date_sk end as int), cast(case when ss_sold_time_sk is null 
then 0 else ss_sold_time_sk end as bigint), cast(case when ss_item_sk is null 
then 0.0 else ss_item_sk end as float), cast(case when ss_customer_sk is null 
then 0.0 else ss_customer_sk end as decimal(18,9)), cast(case when ss_cdemo_sk 
is null then 0 else ss_cdemo_sk end as varchar(20)), ss_hdemo_sk, ss_addr_sk, 
ss_store_sk, ss_promo_sk, ss_ticket_number limit 100

Here is the error I saw:

11:46:46.836 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0 - Producer Thread] 
DEBUG o.apache.drill.exec.memory.Accountor - Fragment:0:0 Reserved 32768 bytes. 
Total Allocated: 778240
11:46:46.848 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0] DEBUG 
o.a.d.e.p.impl.common.HashTable - Put into hash table failed .. Retrying with 
new batch holder...
.....

11:48:49.936 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0 - Producer Thread] 
DEBUG o.apache.drill.exec.memory.Accountor - Fragment:0:0 Reserved 32768 bytes. 
Total Allocated: 778240
11:48:49.947 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0] DEBUG 
o.a.d.e.p.impl.common.HashTable - Put into hash table failed .. Retrying with 
new batch holder...

The data is tpcds and converted into json using drill's json writer. Since 
eventually the query completes and passes data verification, the json writer is 
probably converting parquet to json correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to