[ 
https://issues.apache.org/jira/browse/JENA-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-1552:
--------------------------------
    Description: 
Following on from JENA-1550, this ticket is for phased loading which combined 
features of the sequential loader and the parallel loader.

When building all the persistent datastructures (parallel loader), the work on 
different indexes at the same time is competing for hardware resources, RAM and 
I/O bandwidth.  As the size to load grows, this becomes a noticeable slowdown.

The sequential loader is the other extreme of the design spectrum. It does work 
on one index at a time so as to maximize caching efficiency.

Phased loading has parallel operation per phase and splits work into subsets of 
indexes.

At 200m and loading to rotational disk, an experimental phased loader working 
with 2 indexes at a time, starts to become faster than parallel on the same 
hardware as used for the [figures in 
JENA-1550|https://issues.apache.org/jira/browse/JENA-1550#comment-16484269] 
(57K parallel, 70K phased).

  was:
Following on from JENA-1550, this ticket is for phased loading which combined 
features of the sequential loader and the parallel loader.

When building all the persistent datastructures (parallel loader), the work on 
different indexes at the same time is competing for hardware resources, RAM and 
I/O bandwidth.  As the size to load grows, this becomes a noticeable slowdown.

The sequential loader is the other extreme of the design spectrum. It does work 
on one index at a time so as to maximize caching efficiency.

Phased loading has parallel operation per phase and splits work into subsets of 
indexes.

At 200m and loading to rotational disk, an experimental phased loader working 
with 2 indexes at a time, starts to become faster than parallel on the same 
hardware as used for the [figures in 
JENA-1550|https://issues.apache.org/jira/browse/JENA-1550#comment-16484269] 
(56K parallel, 76K phased).


> Bulk loader for TDB2 (phased loading)
> -------------------------------------
>
>                 Key: JENA-1552
>                 URL: https://issues.apache.org/jira/browse/JENA-1552
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: TDB2
>            Reporter: Andy Seaborne
>            Assignee: Andy Seaborne
>            Priority: Major
>
> Following on from JENA-1550, this ticket is for phased loading which combined 
> features of the sequential loader and the parallel loader.
> When building all the persistent datastructures (parallel loader), the work 
> on different indexes at the same time is competing for hardware resources, 
> RAM and I/O bandwidth.  As the size to load grows, this becomes a noticeable 
> slowdown.
> The sequential loader is the other extreme of the design spectrum. It does 
> work on one index at a time so as to maximize caching efficiency.
> Phased loading has parallel operation per phase and splits work into subsets 
> of indexes.
> At 200m and loading to rotational disk, an experimental phased loader working 
> with 2 indexes at a time, starts to become faster than parallel on the same 
> hardware as used for the [figures in 
> JENA-1550|https://issues.apache.org/jira/browse/JENA-1550#comment-16484269] 
> (57K parallel, 70K phased).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to