[jira] [Commented] (DRILL-5544) Out of heap running CTAS against text delimited

ASF GitHub Bot (JIRA) Thu, 08 Jun 2017 12:07:33 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043255#comment-16043255
 ]


ASF GitHub Bot commented on DRILL-5544:
---------------------------------------

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/846
  
    Chatted with Parth who mentioned that Parquet page sizes are typically on 
the order of 1MB, maybe 8 MB, but 16 MB is too large.
    
    The concern expressed in earlier comments was that if we buffer, say, 256 
MB of data per file, and we're doing many parallel writes, we will use up too 
much memory.
    
    But, if we buffer only one page at a time, and we control page size to be 
some amount on the order of 1-2 MB, then even with 100 threads, we're still 
using only 200 MB, say, which is fine.
    
    In this case, the direct memory solution is fine. (But please check 
performance.)
    
    However, if we are running out of memory, I wonder if we are not 
controlling page size and letting them get too large? Did you happen to check 
the size of the pages we are writing?
    
    If the pages are too big, let's file another JIRA ticket to fix that 
problem so that we have a complete solution.
    
    Once we confirm that we are writing small pages (or file that JIRA if not), 
I'll change my vote from +0 to +1.


> Out of heap running CTAS against text delimited
> -----------------------------------------------
>
>                 Key: DRILL-5544
>                 URL: https://issues.apache.org/jira/browse/DRILL-5544
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.10.0
>         Environment: - 2 or 4 nodes cluster
> - 4G or 8G of Java heap and more than 8G of direct memory
> - planner.width.max_per_node = 40
> - store.parquet.compression = none
> To generate lineitem.tbl file unzip dbgen.tgz archive and run:
> {code}dbgen -TL -s 500{code}
>            Reporter: Vitalii Diravka
>            Assignee: Vitalii Diravka
>             Fix For: 1.11.0
>
>         Attachments: dbgen.tgz
>
>
> This query causes the drillbit to hang:
> {code}
> create table xyz as
> select
> cast(columns[0] as bigint) l_orderkey,
> cast(columns[1] as integer) l_poartkey,
> cast(columns[2] as integer) l_suppkey,
> cast(columns[3] as integer) l_linenumber,
> cast(columns[4] as double) l_quantity,
> cast(columns[5] as double) l_extendedprice,
> cast(columns[6] as double) l_discount,
> cast(columns[7] as double) l_tax,
> cast(columns[8] as char(1)) l_returnflag,
> cast(columns[9] as char(1)) l_linestatus,
> cast(columns[10] as date) l_shipdate,
> cast(columns[11] as date) l_commitdate,
> cast(columns[12] as date) l_receiptdate,
> cast(columns[13] as char(25)) l_shipinstruct,
> cast(columns[14] as char(10)) l_shipmode,
> cast(columns[15] as varchar(44)) l_comment
> from
> `lineitem.tbl`;
> {code}
> OOM "Java heap space" from the drillbit.log:
> {code:title=drillbit.log|borderStyle=solid}
> ...
> 2017-02-07 22:38:11,031 [2765b496-0b5b-a3df-c252-a8bb9cd2e52f:frag:1:53] 
> DEBUG o.a.d.e.s.p.ParquetDirectByteBufferAllocator - 
> ParquetDirectByteBufferAllocator: Allocated 209715 bytes. Allocated 
> ByteBuffer id: 1563631814
> 2017-02-07 22:38:16,478 [2765b496-0b5b-a3df-c252-a8bb9cd2e52f:frag:1:1] ERROR 
> o.a.d.exec.server.BootStrapContext - 
> org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
> java.lang.OutOfMemoryError: Java heap space
> 2017-02-07 22:38:17,391 [2765b496-0b5b-a3df-c252-a8bb9cd2e52f:frag:1:13] 
> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> FragmentExecutor.
> ...
> {code}
> To reproduce the issue please see environment details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5544) Out of heap running CTAS against text delimited

Reply via email to