Hi,

we're trying to load our project internal data set


with currently 959,170,877 quads (still growing)

on a 

24-core AMD EPYC 7443P with 2.85-4.00GHz
256GB RAM
and Samsung SSD 870 QVO 8TB SATA SSDs in a RAIDZ1

tdb2.tdbloader --loader=parallel 
21,450.519 seconds

especially noticeable towards the end, it stalls massively (Batch:
1,169). Avg: 44,752

The produced tdb2 files are 297G


tdb2.xloader --threads 11
25,295 seconds
Overall Rate     37,919 tuples per second

the xloader is a bit slower (~+1 hour) but seems to put much less
strain on the system. Also the tdb2 is much more compact -- 173G


Curious if you have any advice to improve performance?

Cheers,

On 2022/11/16 12:37:19 Andy Seaborne wrote:
> 
> 
> On 16/11/2022 07:54, LB wrote:
> > Andy got a new computer? Nice.
> > 
> > I'm wondering if higher bandwidth of DDR5 already has an impact.
> > 
> > Performance with xloader was ~ 4x lower than tdbloader? Any ideas
why?
> 
> xloader does more work (sorting is a separate step) with less
resources.
> 
> tdb2.tdbloader --loader=parallel is slower if the I/O bandwidth isn't
> there and also performs parallel random I/O operations hence it is
bad 
> on HDD (and to some extend on SATA SSDs).
> 
> xloader is disk friendly and uses (roughly speaking) only a single
write 
> channel.
> 
>      Andy
> 
> > Can you try a real world dataset like Wikidata truthy as well?
> > 
> > I could also give it another try if we agree on timestamp of the
dump as 
> > well as the Jena version for better comparison. Collecting those
runs on 
> > the Jena site would be good material for interested people.
> > 
> > On 13.11.22 19:26, Andy Seaborne wrote:
> >> Trying out a specific machine:
> >>
> >> 1 billion triples : BSBM-1000 (1,000,253,325 triples)
> >>
> >> tdb2.tdbloader --loc DB2 bsbm-1000m.nt.gz
> >> Time: 3,218.82 seconds (53mins 39secs)
> >> Rate: 310,751 triples/s
> >>
> >> The machine:
> >>
> >> Dell 8950, Intel® Core™ i7-12700K Processor
> >>   8 performance cores with hyper threading
> >>   4 Efficient-cores
> >>   Total : 16+4 threads
> >>
> >> 64G RAM DDR5, 2 memory channels
> >> m2 SSD (1TB)
> >>
> >> The database is 191GBytes
> >>
> >> 4 threads were running at 100% and they were spread across cores 
> >> (other threads were doing I/O and general housekeeping).
> >>
> >> The OS didn't apply any thermal controls - the active threads
weren't 
> >> being moved across cores, the CPU temperatures were only around
44C, 
> >> and the processor fans wasn't elevated.
> >>
> >> The machine was usable during the load.
> >>
> >> ----
> >>
> >> On the same hardware tdb2.xloader achieved 87kTPS and a database
of 
> >> 132Gbytes
> 

Reply via email to