Re: Estimating TDB2 size

2017-11-25 Thread Andrew U. Frank

i have   specific questiosn in relation to what ajs6f said:

i have a TDB store with 1/3 triples with very small literals (3-5 char), 
where the same sequence is often repeated. would i get smaller store and 
better performance if these were URI of the character sequence (stored 
once for each repeated case)? any guess how much I could improve?


does the size of the URI play a role in the amount of storage used. i 
observe that i have for 33 M triples a TDB size (files) of 13 GB, which 
means about 300 byte per triple. the literals are all short (very seldom 
more than 10 char, mostly 5 - words from english text). is is a named 
graph, if this makes a difference.


thank you!

andrew


On 11/25/2017 06:42 AM, ajs6f wrote:

Andy may be able to be more precise, but I can tell you right away that it's not a 
straightforward function. How many literals are there "per triple"? How big are 
the literals, on average? How many unique bnodes and URIs? All of these things will 
change the eventual size of the database.

ajs6f


On Nov 25, 2017, at 6:40 AM, Laura Morales  wrote:

Is it possible to estimate the size of a TDB2 store from one of nt/turtle/xml 
input file, without actually creating the store? Is there maybe a tool for this?


--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



Is StreamRDFWriter.write() thread-safe?

2017-11-25 Thread Zak Mc Kracken

Hi all,

as per subject, is this operation (seen at 
https://jena.apache.org/documentation/io/streaming-io.html) thread-safe?


StreamRDFWriter.write(output, model.getGraph(), lang);

ie, can it be run by multiple threads in parallel, assuming each spawns 
its own Model instance and invokes the operation above at their end, 
when the model is complete?


Will the write() above manage the concurrent access to the output stream?

At which level? The triple, some sort of block, or the entire 
operation/graph? (The latter case implying there aren't benefits from 
this kind of parallelism).


From the first tests I've done, it seems it's thread-safe, but I don't 
get the details.


Thank you in advance,

Marco.





Re: Fuseki UI missing "SPARQL endpoint"

2017-11-25 Thread Andy Seaborne



On 25/11/17 23:05, Laura Morales wrote:

With the new snapshot most problems seems to be fixed. However I still get "WARN 
ModTDBDataset :: Unexpected: Not a TDB2 dataset for type DatasetTDB2" which I don't 
understand if it's OK or an error or a bug.


Are you sure you are running the latest? (tdb2.tdbloader is in
"apache-jena" zip file)

It works for me.


Yes, 3.6.
Then there has to be something wrong with the config file here 
https://jena.apache.org/documentation/tdb2/tdb2_fuseki.html (first listing).
I get the WARN when using --tdb, but I do not get it if I use --loc


Because it's assembler related.  --loc does not use an assembler.

(Fix just pushed to the master branch)





Re: Fuseki UI missing "SPARQL endpoint"

2017-11-25 Thread Laura Morales
> > With the new snapshot most problems seems to be fixed. However I still get 
> > "WARN ModTDBDataset :: Unexpected: Not a TDB2 dataset for type DatasetTDB2" 
> > which I don't understand if it's OK or an error or a bug.
> 
> Are you sure you are running the latest? (tdb2.tdbloader is in
> "apache-jena" zip file)
> 
> It works for me.

Yes, 3.6.
Then there has to be something wrong with the config file here 
https://jena.apache.org/documentation/tdb2/tdb2_fuseki.html (first listing).
I get the WARN when using --tdb, but I do not get it if I use --loc


Re: Jena/Fuseki graph sync

2017-11-25 Thread Laura Morales
> That was pure *parsing* speed with no generation of triples. Data
> validation task.

A question related to this. Is tdb2.tdloader mostly limited by hard disk speed, 
or does it require a lot of CPU/RAM computation as well?


Re: Fuseki UI missing "SPARQL endpoint"

2017-11-25 Thread Andy Seaborne

OK - I see what you are doing now.

The warning is harmless and I'll remove it.

(BTW --tdb and --desc are synomyms.  You only need one - the last one is 
the value actually used.)


Andy

On 25/11/17 22:17, Andy Seaborne wrote:



On 25/11/17 21:53, Laura Morales wrote:

Please could you try a development version? (which works for me).


With the new snapshot most problems seems to be fixed. However I still 
get "WARN  ModTDBDataset    :: Unexpected: Not a TDB2 dataset for 
type DatasetTDB2" which I don't understand if it's OK or an error or a 
bug.


Are you sure you are running the latest? (tdb2.tdbloader is in 
"apache-jena" zip file)


It works for me.

  tdb2.tdbloader --version
==>
Jena:   VERSION: 3.6.0-SNAPSHOT

     Andy


Re: Fuseki UI missing "SPARQL endpoint"

2017-11-25 Thread Andy Seaborne



On 25/11/17 21:53, Laura Morales wrote:

Please could you try a development version? (which works for me).


With the new snapshot most problems seems to be fixed. However I still get "WARN  
ModTDBDataset:: Unexpected: Not a TDB2 dataset for type DatasetTDB2" which I 
don't understand if it's OK or an error or a bug.


Are you sure you are running the latest? (tdb2.tdbloader is in 
"apache-jena" zip file)


It works for me.

 tdb2.tdbloader --version
==>
Jena:   VERSION: 3.6.0-SNAPSHOT

Andy


Re: Jena/Fuseki graph sync

2017-11-25 Thread Andy Seaborne



On 25/11/17 18:46, Laura Morales wrote:

Parsing the data: I got:

latest-truthy.nt.gz
4,736.87 sec : 2,199,382,887 Triples : 464,311.63 per second

latest-all.ttl.gz
8,864.36 sec : 4,787,194,669 Triples : 540,049.73 per second
and 3,284,314 warnings.


Did you get these numbers using tdb2.tdbloader?


That was pure *parsing* speed with no generation of triples.  Data 
validation task.



My AVG number of triples per second when using tdb2.tdbloader is ~60K and it also 
seems to slow down over time (<30K)
Which kind of computer did you get these numbers on? cpu, type of ram, disk 
(hdd, ssd)...



Re: Fuseki UI missing "SPARQL endpoint"

2017-11-25 Thread Laura Morales
> Please could you try a development version? (which works for me).

With the new snapshot most problems seems to be fixed. However I still get 
"WARN  ModTDBDataset:: Unexpected: Not a TDB2 dataset for type 
DatasetTDB2" which I don't understand if it's OK or an error or a bug.
I still get a very low number of triples loaded per second, around 30K. How can 
I control this value to increase the number of loaded triples?


Re: Fuseki UI missing "SPARQL endpoint"

2017-11-25 Thread Andy Seaborne



On 25/11/17 18:57, Laura Morales wrote:

https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/3.6.0-SNAPSHOT/


What's the difference between the files in "apache-jena-fuseki" and "jena-fuseki", as well as 
"apache-jena" and "jena"?



We use "apache-jena*" for the main delivery binaries.

"apache-jena-libs"   - maven artifact for the main Jena jars
"apache-jena"  - the binary distribution of Jena libraries
"apache-jena-fuseki" - the binary distribution of Fuseki
"jena-fuseki"- the intermediate parent POM for Fuseki1
"jena"   - the top POM and, (3.6.0-dev) the project parent

Andy


Re: Fuseki UI missing "SPARQL endpoint"

2017-11-25 Thread Laura Morales
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/3.6.0-SNAPSHOT/

What's the difference between the files in "apache-jena-fuseki" and 
"jena-fuseki", as well as "apache-jena" and "jena"?


Re: Jena/Fuseki graph sync

2017-11-25 Thread Laura Morales
> Parsing the data: I got:
> 
> latest-truthy.nt.gz
> 4,736.87 sec : 2,199,382,887 Triples : 464,311.63 per second
> 
> latest-all.ttl.gz
> 8,864.36 sec : 4,787,194,669 Triples : 540,049.73 per second
> and 3,284,314 warnings.

Did you get these numbers using tdb2.tdbloader? My AVG number of triples per 
second when using tdb2.tdbloader is ~60K and it also seems to slow down over 
time (<30K)
Which kind of computer did you get these numbers on? cpu, type of ram, disk 
(hdd, ssd)...


Re: Fuseki UI missing "SPARQL endpoint"

2017-11-25 Thread Andy Seaborne



On 25/11/17 18:17, Laura Morales wrote:

Please could you try a development version? (which works for me).


Downloading now. I hope I can compile it :)
If the bug is still there, I'll report back.


You don't need to - the project builds development versions for testing 
on a daily basis. These are not releases and have not been approved by 
the PMC.


https://repository.apache.org/content/repositories/snapshots/org/apache/jena/apache-jena-fuseki/3.6.0-SNAPSHOT/

They are dated and number so pick the latest (which is at the end)

Andy


Re: Jena/Fuseki graph sync

2017-11-25 Thread Andy Seaborne



On 25/11/17 12:08, Laura Morales wrote:

How long does it take to load, using what hardware?
This is "all", and not "truthy"?


Yes this file 
https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz


Parsing the data: I got:

latest-truthy.nt.gz
4,736.87 sec : 2,199,382,887 Triples : 464,311.63 per second

latest-all.ttl.gz
8,864.36 sec : 4,787,194,669 Triples : 540,049.73 per second
   and 3,284,314 warnings.


Intel Core 2 Duo 8GB 1TB
I have never timed it but it takes hours. Can Jena use more cores to create the 
TDB2 store faster?


It can, as in "it could be enhanced to do that", but it doesn't.

Experiment needed - on a normal commodity server or a portable, the 
limiting factor may not be the CPU, or just the CPU. The system bus 
(moving data around) and the persistent storage may be limitations.


Andy


Re: Fuseki UI missing "SPARQL endpoint"

2017-11-25 Thread Laura Morales
> Please could you try a development version? (which works for me).

Downloading now. I hope I can compile it :)
If the bug is still there, I'll report back.


Re: Fuseki UI missing "SPARQL endpoint"

2017-11-25 Thread Andy Seaborne

This is the same issue as the UI upload bug.

Please could you try a development version? (which works for me).

Andy

On 25/11/17 14:39, Laura Morales wrote:

With the older Fuseki release, when I opened "query" UI at localhost:3030, the "SPARQL 
endpoint" was automatically filled with the dataset query URL 
"http://localhost:3030/ds/query;. With the new release, it looks like this is no longer the case 
(the field is empty). I don't know if it's a bug or a feature, but it's kinda handy to have it set by 
default. Even better would be a drop-down box that allows to select more URLs for the dataset (query, update, 
...).





Re: fuseki 3.5

2017-11-25 Thread Andy Seaborne

(I'm assuming this was supposed to go to users@)

Normally the cycle is very 3-4 months but that is a hope rather than a 
plan.  All Jena contributors are volunteers and time comes and goes for 
us all.


What people on this list can do is test the development builds and 
report good/bad.  That makes the release more reliable and more likely.


Andy

On 25/11/17 17:27, Andrew U. Frank wrote:
thank you for the quick answer. i guess i can wait for the next release 
with all bugs fixed.


when do you plan the next release?

andrew


On 11/25/2017 12:23 PM, Andy Seaborne wrote:

There was a bug in Fuseki2 for 3.5.0 that broken download.

It is fixed in the development version (if you want to try that out) 
and the next release.


    Andy

On 25/11/17 17:17, Andrew U. Frank wrote:
i tried 3.5 and got errors in loading data using the gui. loading 
with sparql update protocol gives the expected confifmation message, 
but the gui does not show anything in the info page, nor seem queries 
to work. is the fuseki gui ready for 3.5? (if not, please put this on 
the info page for downloading 3.5)


thank you - i am looking forward to use the improved TB2 datastructure!

andrew





Re: fuseki 3.5

2017-11-25 Thread Andy Seaborne

There was a bug in Fuseki2 for 3.5.0 that broken download.

It is fixed in the development version (if you want to try that out) and 
the next release.


Andy

On 25/11/17 17:17, Andrew U. Frank wrote:
i tried 3.5 and got errors in loading data using the gui. loading with 
sparql update protocol gives the expected confifmation message, but the 
gui does not show anything in the info page, nor seem queries to work. 
is the fuseki gui ready for 3.5? (if not, please put this on the info 
page for downloading 3.5)


thank you - i am looking forward to use the improved TB2 datastructure!

andrew



fuseki 3.5

2017-11-25 Thread Andrew U. Frank
i tried 3.5 and got errors in loading data using the gui. loading with 
sparql update protocol gives the expected confifmation message, but the 
gui does not show anything in the info page, nor seem queries to work. 
is the fuseki gui ready for 3.5? (if not, please put this on the info 
page for downloading 3.5)


thank you - i am looking forward to use the improved TB2 datastructure!

andrew

--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
Geoinformation, TU Wien  +43 1 58801 12700 office
Gusshausstr. 27-29   +43 1 55801 12799 fax
1040 Wien Austria+43 676 419 25 72 mobil



Re: Unexpected: Not a TDB2 dataset for type DatasetTDB2

2017-11-25 Thread ajs6f
Yes, tdbloader and tdb2.loader are "incremental" in the sense that they add to 
a database, not replace it. They are essentially CLI tools for "add" 
transactions.

TDB[1|2] are single-writer, which means you cannot run those two commands in 
parallel, but you can loop over your graphs and load them one after another.

ajs6f

> On Nov 25, 2017, at 11:04 AM, Laura Morales  wrote:
> 
>>> g:mygraph
>> 
>> Beware that graph name can't be a prefixed name, it is treated as URI.
> 
> Can I use tdb2.tdbloader to load two graphs not at the same time? That is
> 
> $ tdb2.tdbloader --graph graph1 one.nt
> $ tdb2.tdbloader --graph graph2 two.nt



Re: Unexpected: Not a TDB2 dataset for type DatasetTDB2

2017-11-25 Thread Laura Morales
> > g:mygraph
> 
> Beware that graph name can't be a prefixed name, it is treated as URI.

Can I use tdb2.tdbloader to load two graphs not at the same time? That is

$ tdb2.tdbloader --graph graph1 one.nt
$ tdb2.tdbloader --graph graph2 two.nt


Re: Unexpected: Not a TDB2 dataset for type DatasetTDB2

2017-11-25 Thread Andy Seaborne
Hi,

This is fixed in the development builds.

We also know that loading into a named graph, after the fix, creates a
larger dataset than TDB1 would.

It is the same size as TDB1 if TDB1 used the same copy-style loader - for
some reason, TDB1 bulk loader does things in an order so that the indexes
are better packed.

There isn't a full TDB2 bulkloader yet. The command exists so the syntax of
use will remain but the algorithm is a naive one ATM.

> g:mygraph

Beware that graph name can't be a prefixed name, it is treated as URI.

Andy


On 25 November 2017 at 14:30, Laura Morales  wrote:

> I'm experimenting with the new TDB2 but I get some errors. I've copied the
> config file described in https://jena.apache.org/documentation/tdb2/tdb2_
> fuseki.html into "run/config.ttl". This seems to work fine, because I can
> start the endpoint and see the empty dataset. The problems are when
> creating the TDB2 store.
> I've downloaded Fuseki, then downloaded Jena inside the Fuseki root
> directory.
>
> $ ./jena-3.5.0/bin/tdb2.tdbloader --tdb run/config.ttl --desc
> run/config.ttl --verbose file.nt
>
> 15:22:08 WARN  ModTDBDataset:: Unexpected: Not a TDB2 dataset for
> type DatasetTDB2
>
> ^--- I get this warning but otherwise the graph loads fine (as far as I
> can tell).
>
> $ ./jena-3.5.0/bin/tdb2.tdbloader --tdb run/config.ttl --desc
> run/config.ttl --graph g:mygraph --verbose file.nt
>
> 15:22:08 WARN  ModTDBDataset:: Unexpected: Not a TDB2 dataset for
> type DatasetTDB2
> java.lang.ClassCastException: org.apache.jena.tdb2.store.GraphViewSwitchable
> cannot be cast to org.apache.jena.tdb2.store.GraphTDB
> at tdb2.cmdline.CmdTDBGraph.getGraph(CmdTDBGraph.java:70)
> at tdb2.tdbloader.loadNamedGraph(tdbloader.java:123)
> at tdb2.tdbloader.exec(tdbloader.java:113)
> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
> at tdb2.tdbloader.main(tdbloader.java:54)
>
> ^--- Here I can't load the graph at all.
>


Re: Documentation typo

2017-11-25 Thread Laura Morales
> What browser are you using?

FF dev


Re: Assembler for GenericRuleEngine Custom Builtin

2017-11-25 Thread ajs6f
I'm not too familiar with the rules system, so I may be wrong, but I think the 
answer is simply no, we don't have that feature right now. It does sound quite 
useful. If you write this code for yourself please do open a ticket (actually, 
that would be nice in any event) and send a PR!

For anyone who may be wondering about programmatic use of custom built-ins, 
there is a test case here:

org.apache.jena.reasoner.rulesys.test.TestGenericRuleReasonerConfig.testRuleLoadingWithOverridenBuiltins()


ajs6f

> On Nov 24, 2017, at 11:26 AM, Nouwt, B. (Barry)  wrote:
> 
> Hi all,
> 
> Does anyone know whether there is an Assember to load a custom Builtin (for 
> usage in rules for the GenericRuleEngine) using a Fuseki configuration .ttl 
> file. I assume not, because I cannot find it in: 
> https://github.com/apache/jena/tree/master/jena-core/src/main/java/org/apache/jena/assembler/assemblers
> 
> I do see a RuleSetAssembler and a ReasonerFactory, but they do not seem to 
> have a Builtin load option. Also, via code I use the BuiltinRegistry class, 
> so it would probably be not too difficult to add this feature myself.
> 
> Any pointers?
> 
> Regards, Barry
> 
> 
> 
> 
> This message may contain information that is not intended for you. If you are 
> not the addressee or if this message was sent to you by mistake, you are 
> requested to inform the sender and delete the message. TNO accepts no 
> liability for the content of this e-mail, for the manner in which you use it 
> and for damage of any kind resulting from the risks inherent to the 
> electronic transmission of messages.



Re: CMS diff: DB2 - Use with Fuseki2

2017-11-25 Thread ajs6f
Committed, thanks!

ajs6f

> On Nov 25, 2017, at 2:40 PM, Laura  wrote:
> 
> Clone URL (Committers only):
> https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Ftdb2%2Ftdb2_fuseki.md
> 
> Laura
> 
> Index: trunk/content/documentation/tdb2/tdb2_fuseki.md
> ===
> --- trunk/content/documentation/tdb2/tdb2_fuseki.md   (revision 1816255)
> +++ trunk/content/documentation/tdb2/tdb2_fuseki.md   (working copy)
> @@ -17,7 +17,7 @@
> PREFIX fuseki:  http://jena.apache.org/fuseki#;
> PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#;
> PREFIX rdfs:http://www.w3.org/2000/01/rdf-schema#;
> -PREFIX tdb2:http://jena.apache.org/2016/tdb#;;
> +PREFIX tdb2:http://jena.apache.org/2016/tdb#;
> PREFIX ja:  http://jena.hpl.hp.com/2005/11/Assembler#;
> 
> [] rdf:type fuseki:Server ;
> 



Fuseki UI missing "SPARQL endpoint"

2017-11-25 Thread Laura Morales
With the older Fuseki release, when I opened "query" UI at localhost:3030, the 
"SPARQL endpoint" was automatically filled with the dataset query URL 
"http://localhost:3030/ds/query;. With the new release, it looks like this is 
no longer the case (the field is empty). I don't know if it's a bug or a 
feature, but it's kinda handy to have it set by default. Even better would be a 
drop-down box that allows to select more URLs for the dataset (query, update, 
...).


Unexpected: Not a TDB2 dataset for type DatasetTDB2

2017-11-25 Thread Laura Morales
I'm experimenting with the new TDB2 but I get some errors. I've copied the 
config file described in 
https://jena.apache.org/documentation/tdb2/tdb2_fuseki.html into 
"run/config.ttl". This seems to work fine, because I can start the endpoint and 
see the empty dataset. The problems are when creating the TDB2 store.
I've downloaded Fuseki, then downloaded Jena inside the Fuseki root directory.

$ ./jena-3.5.0/bin/tdb2.tdbloader --tdb run/config.ttl --desc run/config.ttl 
--verbose file.nt

15:22:08 WARN  ModTDBDataset:: Unexpected: Not a TDB2 dataset for type 
DatasetTDB2

^--- I get this warning but otherwise the graph loads fine (as far as I can 
tell).

$ ./jena-3.5.0/bin/tdb2.tdbloader --tdb run/config.ttl --desc run/config.ttl 
--graph g:mygraph --verbose file.nt

15:22:08 WARN  ModTDBDataset:: Unexpected: Not a TDB2 dataset for type 
DatasetTDB2
java.lang.ClassCastException: org.apache.jena.tdb2.store.GraphViewSwitchable 
cannot be cast to org.apache.jena.tdb2.store.GraphTDB
at tdb2.cmdline.CmdTDBGraph.getGraph(CmdTDBGraph.java:70)
at tdb2.tdbloader.loadNamedGraph(tdbloader.java:123)
at tdb2.tdbloader.exec(tdbloader.java:113)
at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
at tdb2.tdbloader.main(tdbloader.java:54)

^--- Here I can't load the graph at all.


Re: Documentation typo

2017-11-25 Thread Laura Morales
> You can use the "Improve this Page" button at the top right of the page

Done. The editor is a bit weird though, because it doesn't show a scroll bar to 
me and the "Submit" button is hidden down below.


Re: Documentation typo

2017-11-25 Thread ajs6f
Ah, good catch!

You can use the "Improve this Page" button at the top right of the page to 
submit a fix, if you would like.

ajs6f

> On Nov 25, 2017, at 8:45 AM, Laura Morales  wrote:
> 
>> Thanks Laura, but that actually is the correct namespace for TDB2 assembler 
>> RDF.
> 
> Sorry I forgot to mention that the typo is the ';' at the end of the string.



Re: Documentation typo

2017-11-25 Thread Laura Morales
> Thanks Laura, but that actually is the correct namespace for TDB2 assembler 
> RDF.

Sorry I forgot to mention that the typo is the ';' at the end of the string.


Re: Documentation typo

2017-11-25 Thread ajs6f
Thanks Laura, but that actually is the correct namespace for TDB2 assembler RDF.

The namespace for TDB (aka TDB1) is http://jena.hpl.hp.com/2008/tdb#. TDB1 
dates from before Jena became an Apache project.


ajs6f

> On Nov 25, 2017, at 8:20 AM, Laura Morales  wrote:
> 
> https://jena.apache.org/documentation/tdb2/tdb2_fuseki.html
> 
> PREFIX tdb2: ;



Documentation typo

2017-11-25 Thread Laura Morales
https://jena.apache.org/documentation/tdb2/tdb2_fuseki.html

PREFIX tdb2: ;


Re: Jena/Fuseki graph sync

2017-11-25 Thread zPlus

> I downloaded them both yesterday - I was getting a miserable
2Mbyte/s download rate, capped per download, not that we can blame
them for rate controlling downloads, if it is the download site at
all.

I've downloaded the same file a while back, and I was able to get
around 5-6MB/s using multiple connections. However, if I remember
correctly the server would only accept 3 connections max.



Re: Jena/Fuseki graph sync

2017-11-25 Thread Laura Morales
> How long does it take to load, using what hardware?
> This is "all", and not "truthy"?

Yes this file 
https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz
Intel Core 2 Duo 8GB 1TB
I have never timed it but it takes hours. Can Jena use more cores to create the 
TDB2 store faster?


Re: Estimating TDB2 size

2017-11-25 Thread ajs6f
Andy may be able to be more precise, but I can tell you right away that it's 
not a straightforward function. How many literals are there "per triple"? How 
big are the literals, on average? How many unique bnodes and URIs? All of these 
things will change the eventual size of the database.

ajs6f

> On Nov 25, 2017, at 6:40 AM, Laura Morales  wrote:
> 
> Is it possible to estimate the size of a TDB2 store from one of nt/turtle/xml 
> input file, without actually creating the store? Is there maybe a tool for 
> this?



Estimating TDB2 size

2017-11-25 Thread Laura Morales
Is it possible to estimate the size of a TDB2 store from one of nt/turtle/xml 
input file, without actually creating the store? Is there maybe a tool for this?


Re: Jena/Fuseki graph sync

2017-11-25 Thread Andy Seaborne

Laura,

Interesting.
How long does it take to load, using what hardware?
This is "all", and not "truthy"?

(I downloaded them both yesterday - I was getting a miserable 2Mbyte/s 
download rate, capped per download, not that we can blame them for rate 
controlling downloads, if it is the download site at all).


Andy

On 24/11/17 17:24, Laura Morales wrote:

Laura, can you tell us a little more about why you are trying to avoid 
transmitting the whole graph? Is it because of an unreliable network between 
your client and Fuseki or because of something else?


Wikidata is about 4 billion triples, and it takes a lot of time to create the 
TDB store from the nt file. They release a new dump about once a week, and I 
would like to update my local copy when they release a new dump. Reloading the 
entire graph from scratch every time seems very inefficient (as well as an 
intensive process) considering that only a tiny % of the wikidata graph changes 
in a week.



Re: Jena/Fuseki graph sync

2017-11-25 Thread Lorenz Buehmann
Ah, interesting. That's what I meant with incremental changeset provided
by the source maintainer. I knew that there was something like this for
DBpedia, simply providing changesets of triples. Weird that there is no
RDF format available for Wikidata. Maybe Laura could open a feature request.

By the way, thanks for RDF Patch pointer.


Cheers,

Lorenz


On 24.11.2017 18:43, ajs6f wrote:
> Wikimedia does offer a sort of general procedure for this: you can check to 
> see the updates since the last dump and do per-resource changes.
>
> https://www.wikidata.org/wiki/Wikidata:Data_access#Incremental_updates
>
> But perhaps more efficiently for yourself, you could use their incremental 
> dumps:
>
> https://dumps.wikimedia.org/other/incr/wikidatawiki/
>
> which are for some reason only provided in XML. 
>
> ajs6f
>
>> On Nov 24, 2017, at 12:24 PM, Laura Morales  wrote:
>>
>>> Laura, can you tell us a little more about why you are trying to avoid 
>>> transmitting the whole graph? Is it because of an unreliable network 
>>> between your client and Fuseki or because of something else?
>> Wikidata is about 4 billion triples, and it takes a lot of time to create 
>> the TDB store from the nt file. They release a new dump about once a week, 
>> and I would like to update my local copy when they release a new dump. 
>> Reloading the entire graph from scratch every time seems very inefficient 
>> (as well as an intensive process) considering that only a tiny % of the 
>> wikidata graph changes in a week.
>