[ 
https://issues.apache.org/jira/browse/HTRACE-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15026077#comment-15026077
 ] 

Colin Patrick McCabe commented on HTRACE-308:
---------------------------------------------

In order to support efficient streaming deserialization, this patch changes the 
format of WriteSpans from being:
{code}
{"DefaultTrid":"defaultTracerId", "Spans":[<span-msgpack><span-msgpack>...]}
{code}

to being:
{code}
{"DefaultTrid":"str", "NumSpans":<num-spans>}<span><span>...
{code}

Basically, the Spans field has been replaced by NumSpans, and the spans 
themselves appear in a footer.

The maximum RPC size was reduced from 64 MB to 32 MB, and the default size was 
reduced from 48 MB to 16 MB on the Java client.  A smaller size means that we 
can more easily allocate a buffer per worker goroutine on htraced without using 
too much memory.

> Deserialize WriteSpans requests incrementally rather than all at once to 
> optimize GC
> ------------------------------------------------------------------------------------
>
>                 Key: HTRACE-308
>                 URL: https://issues.apache.org/jira/browse/HTRACE-308
>             Project: HTrace
>          Issue Type: Improvement
>          Components: htraced
>    Affects Versions: 4.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HTRACE-308.001.patch
>
>
> We should deserialize WriteSpans requests incrementally rather than all at 
> once.  Currently, we can deserialize 63 MB of spans all at once, which 
> immediately creates somewhere between 60k and 600k spans, depending on span 
> size.  This is hard on the garbage collector because it's a lot of 
> allocations all at once, and because it allocates a very large array to hold 
> it all.
> It would be better to deserialize spans one at a time and feed them into the 
> datastore via the BatchIngestor. This will ensure that we don't have to 
> allocate giant arrays of spans all at once.  If the datastore lags behind the 
> rate of span ingestion, this will avoid us needing to allocate a bunch of 
> memory "up front" which can lead to further slowdowns due to GC.
> Also, we should reuse buffers for the RPC handlers, and use buffering while 
> deserializing to avoid making lots of small reads from the socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to