Victoria Markman created DRILL-3673: ---------------------------------------
Summary: Memory leak in parquet writer on CTAS Key: DRILL-3673 URL: https://issues.apache.org/jira/browse/DRILL-3673 Project: Apache Drill Issue Type: Bug Components: Storage - Writer Affects Versions: 1.2.0 Reporter: Victoria Markman Assignee: Steven Phillips Priority: Critical First CTAS executes successfully, second runs out of memory. If I change storage.format to 'csv' this problem goes away. {code} 0: jdbc:drill:schema=dfs> create table lineitem as select . . . . . . . . . . . . > cast(columns[0] as int) l_orderkey, . . . . . . . . . . . . > cast(columns[1] as int) l_partkey, . . . . . . . . . . . . > cast(columns[2] as int) l_suppkey, . . . . . . . . . . . . > cast(columns[3] as int) l_linenumber, . . . . . . . . . . . . > cast(columns[4] as double) l_quantity, . . . . . . . . . . . . > cast(columns[5] as double) l_extendedprice, . . . . . . . . . . . . > cast(columns[6] as double) l_discount, . . . . . . . . . . . . > cast(columns[7] as double) l_tax, . . . . . . . . . . . . > cast(columns[8] as varchar(200)) l_returnflag, . . . . . . . . . . . . > cast(columns[9] as varchar(200)) l_linestatus, . . . . . . . . . . . . > cast(columns[10] as date) l_shipdate, . . . . . . . . . . . . > cast(columns[11] as date) l_commitdate, . . . . . . . . . . . . > cast(columns[12] as date) l_receiptdate, . . . . . . . . . . . . > cast(columns[13] as varchar(200)) l_shipinstruct, . . . . . . . . . . . . > cast(columns[14] as varchar(200)) l_shipmode, . . . . . . . . . . . . > cast(columns[15] as varchar(200)) l_comment . . . . . . . . . . . . > from `lineitem.dat`; +-----------+----------------------------+ | Fragment | Number of records written | +-----------+----------------------------+ | 1_9 | 2084034 | | 1_18 | 2083936 | | 1_7 | 2083619 | | 1_6 | 2083933 | | 1_8 | 2084177 | | 1_21 | 2084148 | | 1_17 | 2084039 | | 1_16 | 2083863 | | 1_13 | 2083740 | | 1_20 | 2083774 | | 1_22 | 2083954 | | 1_10 | 2083929 | | 1_19 | 2083804 | | 1_11 | 2084107 | | 1_12 | 2083968 | | 1_14 | 2084002 | | 1_15 | 2083988 | | 1_5 | 3633178 | | 1_1 | 4184330 | | 1_3 | 4184246 | | 1_0 | 4192872 | | 1_2 | 4184342 | | 1_4 | 4180069 | +-----------+----------------------------+ 23 rows selected (89.147 seconds) 0: jdbc:drill:schema=dfs> select * from sys.memory; +--------------------+------------+---------------+-------------+-----------------+---------------------+-------------+ | hostname | user_port | heap_current | heap_max | direct_current | jvm_direct_current | direct_max | +--------------------+------------+---------------+-------------+-----------------+---------------------+-------------+ | atsqa4-133.qa.lab | 31010 | 305725032 | 4294967296 | 9799113 | 5570050038 | 8589934592 | +--------------------+------------+---------------+-------------+-----------------+---------------------+-------------+ 1 row selected (0.225 seconds) ***************************** *** Delete line item file *** ***************************** 0: jdbc:drill:schema=dfs> create table lineitem as select . . . . . . . . . . . . > cast(columns[0] as int) l_orderkey, . . . . . . . . . . . . > cast(columns[1] as int) l_partkey, . . . . . . . . . . . . > cast(columns[2] as int) l_suppkey, . . . . . . . . . . . . > cast(columns[3] as int) l_linenumber, . . . . . . . . . . . . > cast(columns[4] as double) l_quantity, . . . . . . . . . . . . > cast(columns[5] as double) l_extendedprice, . . . . . . . . . . . . > cast(columns[6] as double) l_discount, . . . . . . . . . . . . > cast(columns[7] as double) l_tax, . . . . . . . . . . . . > cast(columns[8] as varchar(200)) l_returnflag, . . . . . . . . . . . . > cast(columns[9] as varchar(200)) l_linestatus, . . . . . . . . . . . . > cast(columns[10] as date) l_shipdate, . . . . . . . . . . . . > cast(columns[11] as date) l_commitdate, . . . . . . . . . . . . > cast(columns[12] as date) l_receiptdate, . . . . . . . . . . . . > cast(columns[13] as varchar(200)) l_shipinstruct, . . . . . . . . . . . . > cast(columns[14] as varchar(200)) l_shipmode, . . . . . . . . . . . . > cast(columns[15] as varchar(200)) l_comment . . . . . . . . . . . . > from `lineitem.dat`; java.lang.RuntimeException: java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. Fragment 1:1 [Error Id: 18befee1-e0e9-4e76-b72a-f8180d5f190a on atsqa4-133.qa.lab:31010] at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) at sqlline.SqlLine.print(SqlLine.java:1583) at sqlline.Commands.execute(Commands.java:852) at sqlline.Commands.sql(Commands.java:751) at sqlline.SqlLine.dispatch(SqlLine.java:738) at sqlline.SqlLine.begin(SqlLine.java:612) at sqlline.SqlLine.start(SqlLine.java:366) at sqlline.SqlLine.main(SqlLine.java:259) 0: jdbc:drill:schema=dfs> select * from sys.memory; +--------------------+------------+---------------+-------------+-----------------+---------------------+-------------+ | hostname | user_port | heap_current | heap_max | direct_current | jvm_direct_current | direct_max | +--------------------+------------+---------------+-------------+-----------------+---------------------+-------------+ | atsqa4-133.qa.lab | 31010 | 772476800 | 4294967296 | 483060536 | 7113553910 | 8589934592 | +--------------------+------------+---------------+-------------+-----------------+---------------------+-------------+ 1 row selected (0.179 seconds) {code} To reproduce: 1. Vanilla single node drill 2. DRILL_MAX_DIRECT_MEMORY="8G" DRILL_HEAP="4G" 3. To create lineitem.dat: Download attached 1000_rows.dat Download attached ctas.sh chmod +x ctas.sh ./ctas.sh 4. Run following SQL statement: {code} create table lineitem as select cast(columns[0] as int) l_orderkey, cast(columns[1] as int) l_partkey, cast(columns[2] as int) l_suppkey, cast(columns[3] as int) l_linenumber, cast(columns[4] as double) l_quantity, cast(columns[5] as double) l_extendedprice, cast(columns[6] as double) l_discount, cast(columns[7] as double) l_tax, cast(columns[8] as varchar(200)) l_returnflag, cast(columns[9] as varchar(200)) l_linestatus, cast(columns[10] as date) l_shipdate, cast(columns[11] as date) l_commitdate, cast(columns[12] as date) l_receiptdate, cast(columns[13] as varchar(200)) l_shipinstruct, cast(columns[14] as varchar(200)) l_shipmode, cast(columns[15] as varchar(200)) l_comment from `lineitem.dat`; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)