How does this compare with your groups previous loader performance
investigations? Did any use PCIe/m2?
On 18/11/2022 19:52, Simon Bin wrote:
Hi,
we're trying to load our project internal data set
with currently 959,170,877 quads (still growing)
on a
24-core AMD EPYC 7443P with 2.85-4.00GHz
256GB RAM
and Samsung SSD 870 QVO 8TB SATA SSDs in a RAIDZ1
tdb2.tdbloader --loader=parallel
21,450.519 seconds
especially noticeable towards the end, it stalls massively (Batch:
1,169). Avg: 44,752
1/ Does the process have limits on the amount of memory mapped file
area? If its limited, the resident address space is small and mmap files
don't cache.
2/ I'm not familiar with RAIDZ1 but it seems to requires 2 writes per
block to maintain the parity bit.
3/ Try the other loaders 'phased' and 'sequential' to see if their less
I/O intensive requirements and less overlapping use of file system cache
do better than "parallel".
The produced tdb2 files are 297G
tdb2.xloader --threads 11
25,295 seconds
Overall Rate 37,919 tuples per second
the xloader is a bit slower (~+1 hour) but seems to put much less
strain on the system. Also the tdb2 is much more compact -- 173G
It does more sequential I/O which is SATA friendly.
Andy
Curious if you have any advice to improve performance?
Experiment!
Cheers,
On 2022/11/16 12:37:19 Andy Seaborne wrote:
On 16/11/2022 07:54, LB wrote:
Andy got a new computer? Nice.
I'm wondering if higher bandwidth of DDR5 already has an impact.
Performance with xloader was ~ 4x lower than tdbloader? Any ideas
why?
xloader does more work (sorting is a separate step) with less
resources.
tdb2.tdbloader --loader=parallel is slower if the I/O bandwidth isn't
there and also performs parallel random I/O operations hence it is
bad
on HDD (and to some extend on SATA SSDs).
xloader is disk friendly and uses (roughly speaking) only a single
write
channel.
Andy
Can you try a real world dataset like Wikidata truthy as well?
I could also give it another try if we agree on timestamp of the
dump as
well as the Jena version for better comparison. Collecting those
runs on
the Jena site would be good material for interested people.
On 13.11.22 19:26, Andy Seaborne wrote:
Trying out a specific machine:
1 billion triples : BSBM-1000 (1,000,253,325 triples)
tdb2.tdbloader --loc DB2 bsbm-1000m.nt.gz
Time: 3,218.82 seconds (53mins 39secs)
Rate: 310,751 triples/s
The machine:
Dell 8950, Intel® Core™ i7-12700K Processor
8 performance cores with hyper threading
4 Efficient-cores
Total : 16+4 threads
64G RAM DDR5, 2 memory channels
m2 SSD (1TB)
The database is 191GBytes
4 threads were running at 100% and they were spread across cores
(other threads were doing I/O and general housekeeping).
The OS didn't apply any thermal controls - the active threads
weren't
being moved across cores, the CPU temperatures were only around
44C,
and the processor fans wasn't elevated.
The machine was usable during the load.
----
On the same hardware tdb2.xloader achieved 87kTPS and a database
of
132Gbytes