On 16/11/2022 07:54, LB wrote:
Andy got a new computer? Nice.

I'm wondering if higher bandwidth of DDR5 already has an impact.

Performance with xloader was ~ 4x lower than tdbloader? Any ideas why?

xloader does more work (sorting is a separate step) with less resources.

tdb2.tdbloader --loader=parallel is slower if the I/O bandwidth isn't there and also performs parallel random I/O operations hence it is bad on HDD (and to some extend on SATA SSDs).

xloader is disk friendly and uses (roughly speaking) only a single write channel.

    Andy

Can you try a real world dataset like Wikidata truthy as well?

I could also give it another try if we agree on timestamp of the dump as well as the Jena version for better comparison. Collecting those runs on the Jena site would be good material for interested people.

On 13.11.22 19:26, Andy Seaborne wrote:
Trying out a specific machine:

1 billion triples : BSBM-1000 (1,000,253,325 triples)

tdb2.tdbloader --loc DB2 bsbm-1000m.nt.gz
Time: 3,218.82 seconds (53mins 39secs)
Rate: 310,751 triples/s

The machine:

Dell 8950, Intel® Core™ i7-12700K Processor
  8 performance cores with hyper threading
  4 Efficient-cores
  Total : 16+4 threads

64G RAM DDR5, 2 memory channels
m2 SSD (1TB)

The database is 191GBytes

4 threads were running at 100% and they were spread across cores (other threads were doing I/O and general housekeeping).

The OS didn't apply any thermal controls - the active threads weren't being moved across cores, the CPU temperatures were only around 44C, and the processor fans wasn't elevated.

The machine was usable during the load.

----

On the same hardware tdb2.xloader achieved 87kTPS and a database of 132Gbytes

Reply via email to