[Owlim-discussion] Bunch of questions and problems

Marek Šurek Fri, 12 Oct 2012 03:43:45 -0700

Hi everyone,
during time(summer) I found some problems which I don't fully understand so 
here they are :).


I run OWLIM 5.2(b5316) with Sesame 2.6.9 on both Windows 7 64bit(dev. machine) 
and Linux(test) server. Some issues are connected to both platforms, some only 
to Windows. Both enviroments share these settings. They have differences only 
with RAM for cache.
 
owlim:enablePredicateList "false" ;
owlim:enable-context-index "true" ; 
owlim:enable-optimization "true" ;
owlim:enable-literal-index "true" ;
owlim:predicate-memory "0" ;
owlim:in-memory-literal-properties "false";
owlim:fts-memory "0" ;
owlim:index-compression-ratio "30" ;
owlim:ftsIndexPolicy "never" ;
owlim:journaling "true" ;
owlim:transaction-mode "safe" ; 


1. On both platforms deleting repository did not remove files connected with 
repository. Do you know where can be problem? I know that it was fixed(and it 
worked somewhere around 2.6.5-2.6.6 of Sesame, but it is not happening now). I 
wrote also to Sesame forum with no response. 

2. I would like to ask when or how is OWLIM's built-in cache used, because I 
have these kind of experience and I don't know whether is correct. I run some 
complex query multiple times and it still takes few seconds(10s) to get 
results. Also simple COUNT(*) queries is not cached as time is not smaller 
after multiple times runing. The strange thing is that more RAM is taken by 
OWLIM so something is happening. There was also problem that whether I gave to 
repository 5 or 30GB of RAM(owlim:cache-memory), no performance boost was 
visible. I also have to point out that no INSERTs into database was made 
between firing queries. Could you guide me where can be problem or how can be 
OWLIM's cache invoked?


3. The next problem is connected only to Windows enviroment. After loading data 
to repository and doing some SPARQL inferencing data are normally available. 
After normal Tomcat shutdown and startup (sometimes problem is visible only 
after restaring NB) SPARQL insertions are gone. This is not happening on Linux 
machine. As Im using the same processing tool(automatized loading + SPARQL 
inferening) for both platforms and share the same repository settings I'm not 
sure where can be problem. It seems as if transaction-mode on Windows was 
always "fast". Am I missing something?


4. Is there any way to quickly count triples (explicit or implicit) in 
repository using query? I know about Sesame's
http://localhost:8080/openrdf-sesame/repositories/rep_id/size
option but I need some query approach. The standard COUNT using ?s ?p ?o is on 
large datasets taking tens of minutes to proceed and amount of RAM needed is 
quite big.

6. I found something what could be some mysterious bug. Here is example
DELETE {
?s predicate:isSomething "true"^^xsd:boolean
}
INSERT{
?s predicate:isSomething "false"^^xsd:boolean
}
WHERE {....}
on small dataset it works correctly. The problem appears when doing this on 
40m+ triples dataset(the ?s applies to 100k+ bindings). Suddenly INSERT part of 
query is correctly executed but DELETE is not. Then I have for the same 
property "true" and "false" values which causes me big trouble. And I also 
would like to ask whether some consistency check's can be applied that the 
boolean property can't be true and false at the same time.(adding consistency 
check in .pie file? or is there something better?)

7. My last question is connected to loading statements also on large datasets. 
Around some higher number of triples (40m) is loading performing worse than 
before. It could be normal behaviour as more statements are in repository, but 
I realized that processor is using during loading only about 10-20%. Could be 
this problem of slow disc? Do you thing that SSDs could solve the problem? Or 
could have the problem other roots? I load still cca. the same size files in 
RDF/XML format(not the best one). As properties are not totally type correct I 
can't use batch loading as proposed in some of Jeen's optimization articles.

8. We use different input format (.trig) for backups. Medium sized 
dataset(loaded 48x200MB RDF/XML files) backup zipped in single .zip is loaded 
significantly faster than loading the files one by one. Is it because of .trig 
itself or because there are some optimizations? If .trig causes this much, I 
will make some .xml->.trig converter.

Thank you for your time and answer and looking for our cooperation. 

Best regards,
Marek

_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion

[Owlim-discussion] Bunch of questions and problems

Reply via email to