Re: Upload large datasets to fuseki

2021-12-06 Thread Andy Seaborne




On 06/12/2021 19:10, robert.ba...@tiscali.it wrote:
   


Thanks for the reply.

The data model is relatively simple, dealing
with transactions of goods and user data.

I will wait for the next
release of Jena.

When is it scheduled?


"very soon"

(the vote is in progress - as we are all volunteers, we don't/can't 
schedule releases across the year but in this case, you've asked in the 
middle of one)


Andy




Il 06.12.2021 14:49 Andy
Seaborne ha scritto:


I guess the end of loading with your setup was

very slow.


Marco has mentioned xloader (which is an improved

tdbloader2 that works

on TDB2).

If you can use a machine with

more RAM, "tdb2.loader --loader=parallel"

for 750m will be fastest

but at some point the slow-and-steady xloader

overtakes the parallel

and phased loaders for speed. Tortoise and the Hare!


If you can't

find a larger RAM machine - you can copy the database to

another

machine after its built - xloader is probably the way to go to

load

beyond 750m. It can only load empty databases, not add data to an



existing database.


Sorry to not be definitive - there is a lot of

"it depends" here, both

hardware and data.

What's the data? The

data pattern also affects load speeds.


Andy

On 06/12/2021

13:14, robert.ba...@tiscali.it [2]wrote:



Hello, I have to upload 3

billion triples to Jena Fuseki. I tried using the following command with
a first dataset (0.ttl.gz 1.ttl.gz => 750 million triples):
tdb2_tdbloader.bat --loader = parallel --loc datasetX 0.ttl.gz 1.ttl.gz.
Loading took about 8 hours to upload 750 milion. The system has a
Core-i7, 16 G ram, SSD hard-disk. Is it possible to optimize loading
times? I have seen that there are several types of loaders: tdbloader
tdbloader2 (I can also use a linux system) tdb2_tdbloader (with
different options) Which of these is the best? Thanks! Con Tiscali
Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli
7,99EUR al mese http://tisca.li/Smart70 [1]
   



Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 
7,99€ al mese http://tisca.li/Smart70




Re: Upload large datasets to fuseki

2021-12-06 Thread robert . barry
  

Thanks for the reply.

The data model is relatively simple, dealing
with transactions of goods and user data.

I will wait for the next
release of Jena.

When is it scheduled?

Il 06.12.2021 14:49 Andy
Seaborne ha scritto: 

> I guess the end of loading with your setup was
very slow.
> 
> Marco has mentioned xloader (which is an improved
tdbloader2 that works 
> on TDB2).
> 
> If you can use a machine with
more RAM, "tdb2.loader --loader=parallel" 
> for 750m will be fastest
but at some point the slow-and-steady xloader 
> overtakes the parallel
and phased loaders for speed. Tortoise and the Hare!
> 
> If you can't
find a larger RAM machine - you can copy the database to 
> another
machine after its built - xloader is probably the way to go to 
> load
beyond 750m. It can only load empty databases, not add data to an 
>
existing database.
> 
> Sorry to not be definitive - there is a lot of
"it depends" here, both 
> hardware and data.
> 
> What's the data? The
data pattern also affects load speeds.
> 
> Andy
> 
> On 06/12/2021
13:14, robert.ba...@tiscali.it [2]wrote:
> 
>> Hello, I have to upload 3
billion triples to Jena Fuseki. I tried using the following command with
a first dataset (0.ttl.gz 1.ttl.gz => 750 million triples):
tdb2_tdbloader.bat --loader = parallel --loc datasetX 0.ttl.gz 1.ttl.gz.
Loading took about 8 hours to upload 750 milion. The system has a
Core-i7, 16 G ram, SSD hard-disk. Is it possible to optimize loading
times? I have seen that there are several types of loaders: tdbloader
tdbloader2 (I can also use a linux system) tdb2_tdbloader (with
different options) Which of these is the best? Thanks! Con Tiscali
Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli
7,99EUR al mese http://tisca.li/Smart70 [1]
  


Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 
7,99€ al mese http://tisca.li/Smart70



Re: Upload large datasets to fuseki

2021-12-06 Thread Andy Seaborne

I guess the end of loading with your setup was very slow.

Marco has mentioned xloader (which is an improved tdbloader2 that works 
on TDB2).


If you can use a machine with more RAM, "tdb2.loader --loader=parallel" 
for 750m will be fastest but at some point the slow-and-steady xloader 
overtakes the parallel and phased loaders for speed. Tortoise and the Hare!


If you can't find a larger RAM machine - you can copy the database to 
another machine after its built - xloader is probably the way to go to 
load beyond 750m. It can only load empty databases, not add data to an 
existing database.


Sorry to not be definitive - there is a lot of "it depends" here, both 
hardware and data.


What's the data? The data pattern also affects load speeds.

Andy

On 06/12/2021 13:14, robert.ba...@tiscali.it wrote:
   


Hello,

I have to upload 3 billion triples to Jena Fuseki.
I tried
using the following command with a first dataset (0.ttl.gz 1.ttl.gz =>
750 million triples):
tdb2_tdbloader.bat --loader = parallel --loc
datasetX 0.ttl.gz 1.ttl.gz.

Loading took about 8 hours to upload 750
milion. The system has a Core-i7, 16 G ram, SSD hard-disk.

Is it
possible to optimize loading times?

I have seen that there are several
types of loaders:
tdbloader
tdbloader2 (I can also use a linux
system)
tdb2_tdbloader (with different options)

Which of these is the
best?

Thanks!
   



Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 
7,99€ al mese http://tisca.li/Smart70




Re: Upload large datasets to fuseki

2021-12-06 Thread Marco Neumann
I am currently experimenting with xloader. It's part of the 4.3 release.

It's not as fast as tdb2.tdbloader with the parallel option but it seems to
work more gracefully with an extra large datasets.


On Mon, Dec 6, 2021 at 1:14 PM  wrote:

>
>
> Hello,
>
> I have to upload 3 billion triples to Jena Fuseki.
> I tried
> using the following command with a first dataset (0.ttl.gz 1.ttl.gz =>
> 750 million triples):
> tdb2_tdbloader.bat --loader = parallel --loc
> datasetX 0.ttl.gz 1.ttl.gz.
>
> Loading took about 8 hours to upload 750
> milion. The system has a Core-i7, 16 G ram, SSD hard-disk.
>
> Is it
> possible to optimize loading times?
>
> I have seen that there are several
> types of loaders:
> tdbloader
> tdbloader2 (I can also use a linux
> system)
> tdb2_tdbloader (with different options)
>
> Which of these is the
> best?
>
> Thanks!
>
>
>
> Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a
> soli 7,99€ al mese http://tisca.li/Smart70
>
>

-- 


---
Marco Neumann
KONA


Upload large datasets to fuseki

2021-12-06 Thread robert . barry
  

Hello,

I have to upload 3 billion triples to Jena Fuseki.
I tried
using the following command with a first dataset (0.ttl.gz 1.ttl.gz =>
750 million triples):
tdb2_tdbloader.bat --loader = parallel --loc
datasetX 0.ttl.gz 1.ttl.gz.

Loading took about 8 hours to upload 750
milion. The system has a Core-i7, 16 G ram, SSD hard-disk.

Is it
possible to optimize loading times?

I have seen that there are several
types of loaders:
tdbloader
tdbloader2 (I can also use a linux
system)
tdb2_tdbloader (with different options)

Which of these is the
best?

Thanks!
  


Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 
7,99€ al mese http://tisca.li/Smart70