[ https://issues.apache.org/jira/browse/DRILL-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030202#comment-16030202 ]
ASF GitHub Bot commented on DRILL-5544: --------------------------------------- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/846#discussion_r119223453 --- Diff: exec/java-exec/src/main/java/org/apache/parquet/hadoop/ParquetColumnChunkPageWriteStore.java --- @@ -0,0 +1,269 @@ +/* --- End diff -- Compared this file to the Parquet original (thanks for providing the file name) using [this tool](https://www.diffchecker.com/diff), referencing the diagram [here](https://parquet.apache.org/documentation/latest/). The changes made in the Drill copy seem reasonable. I do have questions, however, about the approach to writing. (See below.) Seems overly memory intensive. But, this is an issue with the Parquet original, not about this "port" of the file. > Out of heap running CTAS against text delimited > ----------------------------------------------- > > Key: DRILL-5544 > URL: https://issues.apache.org/jira/browse/DRILL-5544 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Affects Versions: 1.10.0 > Environment: - 2 or 4 nodes cluster > - 4G or 8G of Java heap and more than 8G of direct memory > - planner.width.max_per_node = 40 > - store.parquet.compression = none > To generate lineitem.tbl file unzip dbgen.tgz archive and run: > {code}dbgen -TL -s 500{code} > Reporter: Vitalii Diravka > Assignee: Vitalii Diravka > Fix For: 1.11.0 > > Attachments: dbgen.tgz > > > This query causes the drillbit to hang: > {code} > create table xyz as > select > cast(columns[0] as bigint) l_orderkey, > cast(columns[1] as integer) l_poartkey, > cast(columns[2] as integer) l_suppkey, > cast(columns[3] as integer) l_linenumber, > cast(columns[4] as double) l_quantity, > cast(columns[5] as double) l_extendedprice, > cast(columns[6] as double) l_discount, > cast(columns[7] as double) l_tax, > cast(columns[8] as char(1)) l_returnflag, > cast(columns[9] as char(1)) l_linestatus, > cast(columns[10] as date) l_shipdate, > cast(columns[11] as date) l_commitdate, > cast(columns[12] as date) l_receiptdate, > cast(columns[13] as char(25)) l_shipinstruct, > cast(columns[14] as char(10)) l_shipmode, > cast(columns[15] as varchar(44)) l_comment > from > `lineitem.tbl`; > {code} > OOM "Java heap space" from the drillbit.log: > {code:title=drillbit.log|borderStyle=solid} > ... > 2017-02-07 22:38:11,031 [2765b496-0b5b-a3df-c252-a8bb9cd2e52f:frag:1:53] > DEBUG o.a.d.e.s.p.ParquetDirectByteBufferAllocator - > ParquetDirectByteBufferAllocator: Allocated 209715 bytes. Allocated > ByteBuffer id: 1563631814 > 2017-02-07 22:38:16,478 [2765b496-0b5b-a3df-c252-a8bb9cd2e52f:frag:1:1] ERROR > o.a.d.exec.server.BootStrapContext - > org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception. > java.lang.OutOfMemoryError: Java heap space > 2017-02-07 22:38:17,391 [2765b496-0b5b-a3df-c252-a8bb9cd2e52f:frag:1:13] > ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, > exiting. Information message: Unable to handle out of memory condition in > FragmentExecutor. > ... > {code} > To reproduce the issue please see environment details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)