[
https://issues.apache.org/jira/browse/JENA-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430478#comment-17430478
]
Andy Seaborne commented on JENA-2173:
-------------------------------------
I did look at {{PipedRDFIterator}} and concluded that it would need changes and
support code.
It sends {{T}} objects one at a time. There is a sync cost on every item being
added to the queue. Buffer size is the block queue size.
{{AsyncParser}} sends large blocks (100k) onto a small queue (10). It runs at
close to the raw parser speed. Especially for binary formats (which parse at
upto 1e6 triples/s) small overheads matter. Pure N-triples is 245kTPS on my
several year old machine. Jena parsing speed has been going up due to better
JDK as well.
{{PipedRDFIterator}} does not preclude multiple senders. For the xloader case,
that's not necessary. In fact stream-order is desirable.
So we could have {{T}} as {{EltStreamRDF}} or {{List<EltStreamRDF>}}, and have
a loop to drive the caller thera {{StreamRDF}}, but now what's left?
Looking at the current uses of in {{RDFDataMgr}}, maybe they should be
deprecated prior to removal in favour of {{AsyncParser}} which works for any
language? And Piped*? It would be good to steadily simplify the code base.
> Add asynchronous parsing
> ------------------------
>
> Key: JENA-2173
> URL: https://issues.apache.org/jira/browse/JENA-2173
> Project: Apache Jena
> Issue Type: Improvement
> Components: RIOT
> Affects Versions: Jena 4.2.0
> Reporter: Andy Seaborne
> Assignee: Andy Seaborne
> Priority: Major
> Fix For: Jena 4.3.0
>
>
> Add code to parser on a separate thread and send batching of parsed items to
> the caller thread for further processing.
> This is only beneficial in certain circumstances because there is overhead in
> setup and in the passing of data between threads.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)