[jira] [Commented] (DRILL-5275) Sort spill serialization is slow due to repeated buffer allocations

Kunal Khatua (JIRA) Tue, 28 Mar 2017 16:30:03 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946196#comment-15946196
 ]


Kunal Khatua commented on DRILL-5275:
-------------------------------------

This involves primarily testing the performance of the Sort Spill feature. 
Since there is work in progress in this area and tests being written for that, 
those tests should be sufficient to quantify the benefits. 
Closing this JIRA as no specific QA verification is required for it.

> Sort spill serialization is slow due to repeated buffer allocations
> -------------------------------------------------------------------
>
>                 Key: DRILL-5275
>                 URL: https://issues.apache.org/jira/browse/DRILL-5275
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>              Labels: ready-to-commit
>             Fix For: 1.10.0
>
>
> Drill provides a sort operator that spills to disk. The spill and read 
> operations use the serialization code in the 
> {{VectorAccessibleSerializable}}. This code, in turn, uses the 
> {{DrillBuf.getBytes()}} method to write to an output stream. (Yes, the "get" 
> method writes, and the "write" method reads...)
> The DrillBuf method turns around and calls the UDLE method that does:
> {code}
>             byte[] tmp = new byte[length];
>             PlatformDependent.copyMemory(addr(index), tmp, 0, length);
>             out.write(tmp);
> {code}
> That is, for each write the code allocates a heap buffer. Since Drill buffers 
> can be quite large (4, 8, 16 MB or larger), the above rapidly fills the heap 
> and causes GC.
> The result is slow performance. On a Mac, with an SSD that can do 700 MB/s of 
> I/O, we get only about 40 MB/s. Very likely because of excessive CPU cost and 
> GC.
> The solution is to allocate a single read or write buffer, then use that same 
> buffer over and over when reading or writing. This must be done in 
> {{VectorAccessibleSerializable}} as it is a per-thread class that has 
> visibility to all the buffers to be written.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5275) Sort spill serialization is slow due to repeated buffer allocations

Reply via email to