Victoria Markman created DRILL-3673:
---------------------------------------

             Summary: Memory leak in parquet writer on CTAS
                 Key: DRILL-3673
                 URL: https://issues.apache.org/jira/browse/DRILL-3673
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Writer
    Affects Versions: 1.2.0
            Reporter: Victoria Markman
            Assignee: Steven Phillips
            Priority: Critical


First CTAS executes successfully, second runs out of memory.
If I change storage.format to 'csv' this problem goes away.

{code}
0: jdbc:drill:schema=dfs> create table lineitem as select
. . . . . . . . . . . . >     cast(columns[0] as int) l_orderkey,
. . . . . . . . . . . . >     cast(columns[1] as int) l_partkey,
. . . . . . . . . . . . >     cast(columns[2] as int) l_suppkey,
. . . . . . . . . . . . >     cast(columns[3] as int) l_linenumber,
. . . . . . . . . . . . >     cast(columns[4] as double) l_quantity,
. . . . . . . . . . . . >     cast(columns[5] as double) l_extendedprice,
. . . . . . . . . . . . >     cast(columns[6] as double) l_discount,
. . . . . . . . . . . . >     cast(columns[7] as double) l_tax,
. . . . . . . . . . . . >     cast(columns[8] as varchar(200)) l_returnflag,
. . . . . . . . . . . . >     cast(columns[9] as varchar(200)) l_linestatus,
. . . . . . . . . . . . >     cast(columns[10] as date) l_shipdate,
. . . . . . . . . . . . >     cast(columns[11] as date) l_commitdate,
. . . . . . . . . . . . >     cast(columns[12] as date) l_receiptdate,
. . . . . . . . . . . . >     cast(columns[13] as varchar(200)) l_shipinstruct,
. . . . . . . . . . . . >     cast(columns[14] as varchar(200)) l_shipmode,
. . . . . . . . . . . . >     cast(columns[15] as varchar(200)) l_comment
. . . . . . . . . . . . > from `lineitem.dat`;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 1_9       | 2084034                    |
| 1_18      | 2083936                    |
| 1_7       | 2083619                    |
| 1_6       | 2083933                    |
| 1_8       | 2084177                    |
| 1_21      | 2084148                    |
| 1_17      | 2084039                    |
| 1_16      | 2083863                    |
| 1_13      | 2083740                    |
| 1_20      | 2083774                    |
| 1_22      | 2083954                    |
| 1_10      | 2083929                    |
| 1_19      | 2083804                    |
| 1_11      | 2084107                    |
| 1_12      | 2083968                    |
| 1_14      | 2084002                    |
| 1_15      | 2083988                    |
| 1_5       | 3633178                    |
| 1_1       | 4184330                    |
| 1_3       | 4184246                    |
| 1_0       | 4192872                    |
| 1_2       | 4184342                    |
| 1_4       | 4180069                    |
+-----------+----------------------------+
23 rows selected (89.147 seconds)

0: jdbc:drill:schema=dfs> select * from sys.memory;
+--------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
|      hostname      | user_port  | heap_current  |  heap_max   | 
direct_current  | jvm_direct_current  | direct_max  |
+--------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
| atsqa4-133.qa.lab  | 31010      | 305725032     | 4294967296  | 9799113       
  | 5570050038          | 8589934592  |
+--------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
1 row selected (0.225 seconds)

*****************************
*** Delete line item file ***
*****************************
0: jdbc:drill:schema=dfs> create table lineitem as select
. . . . . . . . . . . . >     cast(columns[0] as int) l_orderkey,
. . . . . . . . . . . . >     cast(columns[1] as int) l_partkey,
. . . . . . . . . . . . >     cast(columns[2] as int) l_suppkey,
. . . . . . . . . . . . >     cast(columns[3] as int) l_linenumber,
. . . . . . . . . . . . >     cast(columns[4] as double) l_quantity,
. . . . . . . . . . . . >     cast(columns[5] as double) l_extendedprice,
. . . . . . . . . . . . >     cast(columns[6] as double) l_discount,
. . . . . . . . . . . . >     cast(columns[7] as double) l_tax,
. . . . . . . . . . . . >     cast(columns[8] as varchar(200)) l_returnflag,
. . . . . . . . . . . . >     cast(columns[9] as varchar(200)) l_linestatus,
. . . . . . . . . . . . >     cast(columns[10] as date) l_shipdate,
. . . . . . . . . . . . >     cast(columns[11] as date) l_commitdate,
. . . . . . . . . . . . >     cast(columns[12] as date) l_receiptdate,
. . . . . . . . . . . . >     cast(columns[13] as varchar(200)) l_shipinstruct,
. . . . . . . . . . . . >     cast(columns[14] as varchar(200)) l_shipmode,
. . . . . . . . . . . . >     cast(columns[15] as varchar(200)) l_comment
. . . . . . . . . . . . > from `lineitem.dat`;
java.lang.RuntimeException: java.sql.SQLException: RESOURCE ERROR: One or more 
nodes ran out of memory while executing the query.

Fragment 1:1

[Error Id: 18befee1-e0e9-4e76-b72a-f8180d5f190a on atsqa4-133.qa.lab:31010]
        at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
        at 
sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
        at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
        at sqlline.SqlLine.print(SqlLine.java:1583)
        at sqlline.Commands.execute(Commands.java:852)
        at sqlline.Commands.sql(Commands.java:751)
        at sqlline.SqlLine.dispatch(SqlLine.java:738)
        at sqlline.SqlLine.begin(SqlLine.java:612)
        at sqlline.SqlLine.start(SqlLine.java:366)
        at sqlline.SqlLine.main(SqlLine.java:259)


0: jdbc:drill:schema=dfs> select * from sys.memory;
+--------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
|      hostname      | user_port  | heap_current  |  heap_max   | 
direct_current  | jvm_direct_current  | direct_max  |
+--------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
| atsqa4-133.qa.lab  | 31010      | 772476800     | 4294967296  | 483060536     
  | 7113553910          | 8589934592  |
+--------------------+------------+---------------+-------------+-----------------+---------------------+-------------+
1 row selected (0.179 seconds)


{code}
To reproduce:

1. Vanilla single node drill

2.  DRILL_MAX_DIRECT_MEMORY="8G"
    DRILL_HEAP="4G"

3. To create lineitem.dat:
        Download attached 1000_rows.dat
        Download attached ctas.sh
        chmod +x ctas.sh
        ./ctas.sh

4. Run following SQL statement:

{code}
create table lineitem as select
    cast(columns[0] as int) l_orderkey,
    cast(columns[1] as int) l_partkey,
    cast(columns[2] as int) l_suppkey,
    cast(columns[3] as int) l_linenumber,
    cast(columns[4] as double) l_quantity,
    cast(columns[5] as double) l_extendedprice,
    cast(columns[6] as double) l_discount,
    cast(columns[7] as double) l_tax,
    cast(columns[8] as varchar(200)) l_returnflag,
    cast(columns[9] as varchar(200)) l_linestatus,
    cast(columns[10] as date) l_shipdate,
    cast(columns[11] as date) l_commitdate,
    cast(columns[12] as date) l_receiptdate,
    cast(columns[13] as varchar(200)) l_shipinstruct,
    cast(columns[14] as varchar(200)) l_shipmode,
    cast(columns[15] as varchar(200)) l_comment
from `lineitem.dat`;
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to