Hi, we're trying to load our project internal data set
with currently 959,170,877 quads (still growing) on a 24-core AMD EPYC 7443P with 2.85-4.00GHz 256GB RAM and Samsung SSD 870 QVO 8TB SATA SSDs in a RAIDZ1 tdb2.tdbloader --loader=parallel 21,450.519 seconds especially noticeable towards the end, it stalls massively (Batch: 1,169). Avg: 44,752 The produced tdb2 files are 297G tdb2.xloader --threads 11 25,295 seconds Overall Rate 37,919 tuples per second the xloader is a bit slower (~+1 hour) but seems to put much less strain on the system. Also the tdb2 is much more compact -- 173G Curious if you have any advice to improve performance? Cheers, On 2022/11/16 12:37:19 Andy Seaborne wrote: > > > On 16/11/2022 07:54, LB wrote: > > Andy got a new computer? Nice. > > > > I'm wondering if higher bandwidth of DDR5 already has an impact. > > > > Performance with xloader was ~ 4x lower than tdbloader? Any ideas why? > > xloader does more work (sorting is a separate step) with less resources. > > tdb2.tdbloader --loader=parallel is slower if the I/O bandwidth isn't > there and also performs parallel random I/O operations hence it is bad > on HDD (and to some extend on SATA SSDs). > > xloader is disk friendly and uses (roughly speaking) only a single write > channel. > > Andy > > > Can you try a real world dataset like Wikidata truthy as well? > > > > I could also give it another try if we agree on timestamp of the dump as > > well as the Jena version for better comparison. Collecting those runs on > > the Jena site would be good material for interested people. > > > > On 13.11.22 19:26, Andy Seaborne wrote: > >> Trying out a specific machine: > >> > >> 1 billion triples : BSBM-1000 (1,000,253,325 triples) > >> > >> tdb2.tdbloader --loc DB2 bsbm-1000m.nt.gz > >> Time: 3,218.82 seconds (53mins 39secs) > >> Rate: 310,751 triples/s > >> > >> The machine: > >> > >> Dell 8950, Intel® Core™ i7-12700K Processor > >> 8 performance cores with hyper threading > >> 4 Efficient-cores > >> Total : 16+4 threads > >> > >> 64G RAM DDR5, 2 memory channels > >> m2 SSD (1TB) > >> > >> The database is 191GBytes > >> > >> 4 threads were running at 100% and they were spread across cores > >> (other threads were doing I/O and general housekeeping). > >> > >> The OS didn't apply any thermal controls - the active threads weren't > >> being moved across cores, the CPU temperatures were only around 44C, > >> and the processor fans wasn't elevated. > >> > >> The machine was usable during the load. > >> > >> ---- > >> > >> On the same hardware tdb2.xloader achieved 87kTPS and a database of > >> 132Gbytes >