Rahul Challapalli created DRILL-1482:
----------------------------------------
Summary: Tpch 3 over text files for SF100 causes some of the
drillbits(JVM) to crash
Key: DRILL-1482
URL: https://issues.apache.org/jira/browse/DRILL-1482
Project: Apache Drill
Issue Type: Bug
Components: Execution - Flow
Reporter: Rahul Challapalli
git.commit.id.abbrev=5c220e3
We created views on top of Tpch text data (SF 100) so that we need not modify
the TPCH original queries. Below are the views and the query itself.
{code}
create view nation as select cast(columns[0] as int) n_nationkey, columns[1]
n_name, cast(columns[2] as int) n_regionkey, columns[3] n_comment from
`nation_text`;
create view region as select cast(columns[0] as int) r_regionkey, columns[1]
r_name, columns[2] r_comment from `region_text`;
create view part as select cast(columns[0] as int) p_partkey, columns[1]
p_name, columns[2] p_mfgr, columns[3] p_brand, columns[4] p_type,
cast(columns[5] as int) p_size, columns[6] p_container, cast(columns[7] as
double) p_retailprice, columns[8] p_comment from `part_text`;
create view supplier as select cast(columns[0] as int) s_suppkey, columns[1]
s_name, columns[2] s_address, cast(columns[3] as int) s_nationkey, columns[4]
s_phone, cast(columns[5] as double) s_acctbal, columns[6] s_comment from
`supplier_text`;
create view partsupp as select cast(columns[0] as int) ps_partkey,
cast(columns[1] as int) ps_suppkey, cast(columns[2] as int) ps_availqty,
cast(columns[3] as double) ps_supplycost, columns[4] ps_comment from
`partsupp_text`;
create view customer as select cast(columns[0] as int) c_custkey, columns[1]
c_name, columns[2] c_address, cast(columns[3] as int) c_nationkey, columns[4]
c_phone, cast(columns[5] as double) c_acctbal, columns[6] c_mktsegment,
columns[7] c_comment from `customer_text`;
create view orders as select cast(columns[0] as int) o_orderkey,
cast(columns[1] as int) o_custkey, columns[2] o_orderstatus, cast(columns[3] as
double) o_totalprice, cast(columns[4] as date)o_orderdate, columns[5]
o_orderpriority, columns[6] o_clerk, cast(columns[7] as int) o_shippriority,
columns[8] o_comment from `orders_text`;
create view lineitem as select cast(columns[0] as int) l_orderkey,
cast(columns[1] as int) l_partkey, cast(columns[2] as int) l_suppkey,
cast(columns[3] as int) l_linenumber, cast(columns[4] as double) l_quantity,
cast(columns[5] as double) l_extendedprice, cast(columns[6] as double)
l_discount, cast(columns[7] as double) l_tax, columns[8] l_returnflag,
columns[9] l_linestatus, cast(columns[10] as date) l_shipdate, cast(columns[11]
as date) l_commitdate, cast(columns[12] as date) l_receiptdate, columns[13]
l_shipinstruct, columns[14] l_shipmode, columns[15] l_comment from
`lineitem_text`;
-- TPCH Query 3
select
l.l_orderkey,
sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
o.o_orderdate,
o.o_shippriority
from
customer c,
orders o,
lineitem l
where
c.c_mktsegment = 'HOUSEHOLD'
and c.c_custkey = o.o_custkey
and l.l_orderkey = o.o_orderkey
and o.o_orderdate < date '1995-03-25'
and l.l_shipdate > date '1995-03-25'
group by
l.l_orderkey,
o.o_orderdate,
o.o_shippriority
order by
revenue desc,
o.o_orderdate
limit 10;
{code}
The cluster has 8 drillbits running with DRILL_MAX_DIRECT_MEMORY="32G". After
running for around 45 seconds 3 out of the 8 drillbits come down due to a jvm
crash. I attached the log files. Let me know if you need more information.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)