[ 
https://issues.apache.org/jira/browse/JENA-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499427#comment-16499427
 ] 

Andy Seaborne edited comment on JENA-1552 at 6/3/18 1:57 PM:
-------------------------------------------------------------

Verification  that the new, general loader is as good as the JENA-1550 code.

These are figures for [PR#426|https://github.com/apache/jena/pull/426] where 
"parallel" and "phased" are the same loader engine with different execution 
plans.
  
||Loader||Data size||Storage||Time (s)||Rate (Triples per second)||
| TDB2 parallel   | 200m | SSD  | 1,147 | 174,343 |
| TDB2 phased   | 200m | SSD  | 1,646 | 121,395 |
| TDB2 phased   | 200m | Disk  | 2,899 |  69,010 |
  
It is not particularly meaningful that "parallel" is slightly faster than 
JENA-1150. The results were done as the average of a few runs of 
{{tdb2.tdbloader}} so are more trustworthy. Each run is a fresh, un-JITted JVM 
each time. This set of numbers was done with no running applications, which 
increases the amount of free RAM and does not interrupt the CPUs (the main 
impact of that is probably the memory cache loss, not CPU cycles).



was (Author: andy.seaborne):
Verification  that the new, general loader is as good as the JENA-1550 code.

These are figures for [PR#436|https://github.com/apache/jena/pull/426] where 
"parallel" and "phased" are the same loader engine with different execution 
plans.
  
||Loader||Data size||Storage||Time (s)||Rate (Triples per second)||
| TDB2 parallel   | 200m | SSD  | 1,147 | 174,343 |
| TDB2 phased   | 200m | SSD  | 1,646 | 121,395 |
| TDB2 phased   | 200m | Disk  | 2,899 |  69,010 |
  
It is not particularly meaningful that "parallel" is slightly faster than 
JENA-1150. The results were done as the average of a few runs of 
{{tdb2.tdbloader}} so are more trustworthy. Each run is a fresh, un-JITted JVM 
each time. This set of numbers was done with no running applications, which 
increases the amount of free RAM and does not interrupt the CPUs (the main 
impact of that is probably the memory cache loss, not CPU cycles).


> Bulk loader for TDB2 (phased loading)
> -------------------------------------
>
>                 Key: JENA-1552
>                 URL: https://issues.apache.org/jira/browse/JENA-1552
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: TDB2
>            Reporter: Andy Seaborne
>            Assignee: Andy Seaborne
>            Priority: Major
>
> Following on from JENA-1550, this ticket is for phased loading which combined 
> features of the sequential loader and the parallel loader.
> When building all the persistent datastructures (parallel loader), the work 
> on different indexes at the same time is competing for hardware resources, 
> RAM and I/O bandwidth.  As the size to load grows, this becomes a noticeable 
> slowdown.
> The sequential loader is the other extreme of the design spectrum. It does 
> work on one index at a time so as to maximize caching efficiency.
> Phased loading has parallel operation per phase and splits work into subsets 
> of indexes.
> At 200m and loading to rotational disk, an experimental phased loader working 
> with 2 indexes at a time, starts to become faster than parallel on the same 
> hardware as used for the [figures in 
> JENA-1550|https://issues.apache.org/jira/browse/JENA-1550#comment-16484269] 
> (57K parallel, 70K phased).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to