[
https://issues.apache.org/jira/browse/DRILL-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165444#comment-14165444
]
Aman Sinha commented on DRILL-1507:
-----------------------------------
I am looking into this but want to note that the message you see in the log
regarding retrying the hash table insertion is actually an information message,
not a real error. However, I do want to determine if we are doing this
excessively.
For comparison purposes, can you please run a query with plain aggregation (no
group-by) on both json and parquet and post the timings:
select min(ss_quantity) from store_sales;
This will not do hash aggregate and I want to compare the timings without that
in the picture.
> Potential hash insert issue
> ----------------------------
>
> Key: DRILL-1507
> URL: https://issues.apache.org/jira/browse/DRILL-1507
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 0.6.0
> Reporter: Chun Chang
>
> #Thu Oct 02 17:49:48 PDT 2014
> git.commit.id.abbrev=29dde76
> Running the following "case, group by, and order by" query against json file
> type, I saw the following hash insert errors repeatedly. The query finishes
> eventually after a little over 30 min, and the data returned is correct. The
> same query running against parquet file finishes in about a minute. Here is
> the query:
> /root/drillATS/incubator-drill/testing/framework/resources/aggregate1/json/testcases/aggregate26.q
> :
> select cast(case when ss_sold_date_sk is null then 0 else ss_sold_date_sk end
> as int) as soldd, cast(case when ss_sold_time_sk is null then 0 else
> ss_sold_time_sk end as bigint) as soldt, cast(case when ss_item_sk is null
> then 0.0 else ss_item_sk end as float) as itemsk, cast(case when
> ss_customer_sk is null then 0.0 else ss_customer_sk end as decimal(18,9)) as
> custsk, cast(case when ss_cdemo_sk is null then 0 else ss_cdemo_sk end as
> varchar(20)) as cdemo, ss_hdemo_sk as hdemo, ss_addr_sk as addrsk,
> ss_store_sk as storesk, ss_promo_sk as promo, ss_ticket_number as tickn,
> sum(ss_quantity) as quantities from store_sales group by cast(case when
> ss_sold_date_sk is null then 0 else ss_sold_date_sk end as int), cast(case
> when ss_sold_time_sk is null then 0 else ss_sold_time_sk end as bigint),
> cast(case when ss_item_sk is null then 0.0 else ss_item_sk end as float),
> cast(case when ss_customer_sk is null then 0.0 else ss_customer_sk end as
> decimal(18,9)), cast(case when ss_cdemo_sk is null then 0 else ss_cdemo_sk
> end as varchar(20)), ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk,
> ss_ticket_number order by cast(case when ss_sold_date_sk is null then 0 else
> ss_sold_date_sk end as int), cast(case when ss_sold_time_sk is null then 0
> else ss_sold_time_sk end as bigint), cast(case when ss_item_sk is null then
> 0.0 else ss_item_sk end as float), cast(case when ss_customer_sk is null then
> 0.0 else ss_customer_sk end as decimal(18,9)), cast(case when ss_cdemo_sk is
> null then 0 else ss_cdemo_sk end as varchar(20)), ss_hdemo_sk, ss_addr_sk,
> ss_store_sk, ss_promo_sk, ss_ticket_number limit 100
> Here is the error I saw:
> 11:46:46.836 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0 - Producer
> Thread] DEBUG o.apache.drill.exec.memory.Accountor - Fragment:0:0 Reserved
> 32768 bytes. Total Allocated: 778240
> 11:46:46.848 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0] DEBUG
> o.a.d.e.p.impl.common.HashTable - Put into hash table failed .. Retrying with
> new batch holder...
> .....
> 11:48:49.936 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0 - Producer
> Thread] DEBUG o.apache.drill.exec.memory.Accountor - Fragment:0:0 Reserved
> 32768 bytes. Total Allocated: 778240
> 11:48:49.947 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0] DEBUG
> o.a.d.e.p.impl.common.HashTable - Put into hash table failed .. Retrying with
> new batch holder...
> The data is tpcds and converted into json using drill's json writer. Since
> eventually the query completes and passes data verification, the json writer
> is probably converting parquet to json correctly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)