Re: Upload large datasets to fuseki
On 06/12/2021 19:10, robert.ba...@tiscali.it wrote: Thanks for the reply. The data model is relatively simple, dealing with transactions of goods and user data. I will wait for the next release of Jena. When is it scheduled? "very soon" (the vote is in progress - as we are all volunteers, we don't/can't schedule releases across the year but in this case, you've asked in the middle of one) Andy Il 06.12.2021 14:49 Andy Seaborne ha scritto: I guess the end of loading with your setup was very slow. Marco has mentioned xloader (which is an improved tdbloader2 that works on TDB2). If you can use a machine with more RAM, "tdb2.loader --loader=parallel" for 750m will be fastest but at some point the slow-and-steady xloader overtakes the parallel and phased loaders for speed. Tortoise and the Hare! If you can't find a larger RAM machine - you can copy the database to another machine after its built - xloader is probably the way to go to load beyond 750m. It can only load empty databases, not add data to an existing database. Sorry to not be definitive - there is a lot of "it depends" here, both hardware and data. What's the data? The data pattern also affects load speeds. Andy On 06/12/2021 13:14, robert.ba...@tiscali.it [2]wrote: Hello, I have to upload 3 billion triples to Jena Fuseki. I tried using the following command with a first dataset (0.ttl.gz 1.ttl.gz => 750 million triples): tdb2_tdbloader.bat --loader = parallel --loc datasetX 0.ttl.gz 1.ttl.gz. Loading took about 8 hours to upload 750 milion. The system has a Core-i7, 16 G ram, SSD hard-disk. Is it possible to optimize loading times? I have seen that there are several types of loaders: tdbloader tdbloader2 (I can also use a linux system) tdb2_tdbloader (with different options) Which of these is the best? Thanks! Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 7,99EUR al mese http://tisca.li/Smart70 [1] Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 7,99€ al mese http://tisca.li/Smart70
Re: Upload large datasets to fuseki
Thanks for the reply. The data model is relatively simple, dealing with transactions of goods and user data. I will wait for the next release of Jena. When is it scheduled? Il 06.12.2021 14:49 Andy Seaborne ha scritto: > I guess the end of loading with your setup was very slow. > > Marco has mentioned xloader (which is an improved tdbloader2 that works > on TDB2). > > If you can use a machine with more RAM, "tdb2.loader --loader=parallel" > for 750m will be fastest but at some point the slow-and-steady xloader > overtakes the parallel and phased loaders for speed. Tortoise and the Hare! > > If you can't find a larger RAM machine - you can copy the database to > another machine after its built - xloader is probably the way to go to > load beyond 750m. It can only load empty databases, not add data to an > existing database. > > Sorry to not be definitive - there is a lot of "it depends" here, both > hardware and data. > > What's the data? The data pattern also affects load speeds. > > Andy > > On 06/12/2021 13:14, robert.ba...@tiscali.it [2]wrote: > >> Hello, I have to upload 3 billion triples to Jena Fuseki. I tried using the following command with a first dataset (0.ttl.gz 1.ttl.gz => 750 million triples): tdb2_tdbloader.bat --loader = parallel --loc datasetX 0.ttl.gz 1.ttl.gz. Loading took about 8 hours to upload 750 milion. The system has a Core-i7, 16 G ram, SSD hard-disk. Is it possible to optimize loading times? I have seen that there are several types of loaders: tdbloader tdbloader2 (I can also use a linux system) tdb2_tdbloader (with different options) Which of these is the best? Thanks! Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 7,99EUR al mese http://tisca.li/Smart70 [1] Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 7,99€ al mese http://tisca.li/Smart70
Re: Upload large datasets to fuseki
I guess the end of loading with your setup was very slow. Marco has mentioned xloader (which is an improved tdbloader2 that works on TDB2). If you can use a machine with more RAM, "tdb2.loader --loader=parallel" for 750m will be fastest but at some point the slow-and-steady xloader overtakes the parallel and phased loaders for speed. Tortoise and the Hare! If you can't find a larger RAM machine - you can copy the database to another machine after its built - xloader is probably the way to go to load beyond 750m. It can only load empty databases, not add data to an existing database. Sorry to not be definitive - there is a lot of "it depends" here, both hardware and data. What's the data? The data pattern also affects load speeds. Andy On 06/12/2021 13:14, robert.ba...@tiscali.it wrote: Hello, I have to upload 3 billion triples to Jena Fuseki. I tried using the following command with a first dataset (0.ttl.gz 1.ttl.gz => 750 million triples): tdb2_tdbloader.bat --loader = parallel --loc datasetX 0.ttl.gz 1.ttl.gz. Loading took about 8 hours to upload 750 milion. The system has a Core-i7, 16 G ram, SSD hard-disk. Is it possible to optimize loading times? I have seen that there are several types of loaders: tdbloader tdbloader2 (I can also use a linux system) tdb2_tdbloader (with different options) Which of these is the best? Thanks! Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 7,99€ al mese http://tisca.li/Smart70
Re: Upload large datasets to fuseki
I am currently experimenting with xloader. It's part of the 4.3 release. It's not as fast as tdb2.tdbloader with the parallel option but it seems to work more gracefully with an extra large datasets. On Mon, Dec 6, 2021 at 1:14 PM wrote: > > > Hello, > > I have to upload 3 billion triples to Jena Fuseki. > I tried > using the following command with a first dataset (0.ttl.gz 1.ttl.gz => > 750 million triples): > tdb2_tdbloader.bat --loader = parallel --loc > datasetX 0.ttl.gz 1.ttl.gz. > > Loading took about 8 hours to upload 750 > milion. The system has a Core-i7, 16 G ram, SSD hard-disk. > > Is it > possible to optimize loading times? > > I have seen that there are several > types of loaders: > tdbloader > tdbloader2 (I can also use a linux > system) > tdb2_tdbloader (with different options) > > Which of these is the > best? > > Thanks! > > > > Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a > soli 7,99€ al mese http://tisca.li/Smart70 > > -- --- Marco Neumann KONA
Upload large datasets to fuseki
Hello, I have to upload 3 billion triples to Jena Fuseki. I tried using the following command with a first dataset (0.ttl.gz 1.ttl.gz => 750 million triples): tdb2_tdbloader.bat --loader = parallel --loc datasetX 0.ttl.gz 1.ttl.gz. Loading took about 8 hours to upload 750 milion. The system has a Core-i7, 16 G ram, SSD hard-disk. Is it possible to optimize loading times? I have seen that there are several types of loaders: tdbloader tdbloader2 (I can also use a linux system) tdb2_tdbloader (with different options) Which of these is the best? Thanks! Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 7,99€ al mese http://tisca.li/Smart70