[jira] [Commented] (JENA-2173) Add asynchronous parsing

Andy Seaborne (Jira) Tue, 19 Oct 2021 04:16:16 -0700


    [ 
https://issues.apache.org/jira/browse/JENA-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430478#comment-17430478
 ]


Andy Seaborne commented on JENA-2173:
-------------------------------------

I did look at {{PipedRDFIterator}} and concluded that it would need changes and 
support code.

It sends {{T}} objects one at a time. There is a sync cost on every item being 
added to the queue. Buffer size is the block queue size.

{{AsyncParser}} sends large blocks (100k) onto a small queue (10). It runs at 
close to the raw parser speed. Especially for binary formats (which parse at 
upto 1e6 triples/s) small overheads matter. Pure N-triples is 245kTPS on my 
several year old machine. Jena parsing speed has been going up due to better 
JDK as well.

{{PipedRDFIterator}} does not preclude multiple senders. For the xloader case, 
that's not necessary. In fact stream-order is desirable.

So we could have {{T}} as {{EltStreamRDF}} or {{List<EltStreamRDF>}}, and have 
a loop to drive the caller thera {{StreamRDF}}, but now what's left?

Looking at the current uses of in {{RDFDataMgr}}, maybe they should be 
deprecated prior to removal in favour of {{AsyncParser}} which works for any 
language? And Piped*? It would be good to steadily simplify the code base.


> Add asynchronous parsing
> ------------------------
>
>                 Key: JENA-2173
>                 URL: https://issues.apache.org/jira/browse/JENA-2173
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: RIOT
>    Affects Versions: Jena 4.2.0
>            Reporter: Andy Seaborne
>            Assignee: Andy Seaborne
>            Priority: Major
>             Fix For: Jena 4.3.0
>
>
> Add code to parser on a separate thread and send batching of parsed items to 
> the caller thread for further processing.
> This is only beneficial in certain circumstances because there is overhead in 
> setup and in the passing of data between threads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (JENA-2173) Add asynchronous parsing

Reply via email to