Store query results in new RDF

2013-11-01 Thread Adeeb Noor
Hi guys:

I would like to save my SPARQL result coming from ResultSet into new rdf.
(new rdf resources) cause I want to do more work on this subgraph and it
has to be in the original rdf format.

I tried outputAsRDF function and it worked however the result I got the
following:


https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/DDID.owl#genotypePhenotype
"/>
omimt
  
  
https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/DDID.rdf#C0007589
"/>
w
  
  




















  

how I can remove this nodes things and make it something like:

 https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/DDID.rdf#C3229174";>
Cytra-K Oral Product
https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/DDID.owl#chemical
"/>
  

please help me out

-- 
Adeeb Noor
Ph.D. Candidate
Dept of Computer Science
University of Colorado at Boulder
Cell: 571-484-3303
Email: adeeb.n...@colorado.edu


Re: UNSAID keyword

2013-11-01 Thread Tim Harsch
Hi Rob,
I should have been more clear.  Andy confirmed UNSAID should be removed from 
documentation, and said MINUS was missing from the documentation, and I was 
suggesting NOT IN is also missing.


Tim


> On Friday, November 1, 2013 9:40 AM, Rob Vesse  wrote:
> > NOT IN is valid syntax, see
> http://www.w3.org/TR/sparql11-query/#func-not-in
> 
> Rob
> 
> 
> On 31/10/2013 22:44, "Tim Harsch"  wrote:
> 
>> My thoughts as well, and also "NOT IN"?
>> 
>> 
>> 
>> 
>> 
>>>  On Thursday, October 31, 2013 2:54 PM, Andy Seaborne 
> 
>>> wrote:
>>>  > On 31/10/13 21:35, Tim Harsch wrote:
   Hello,
 
   The docs at 
> http://jena.apache.org/documentation/query/negation.html
 
   Say that "UNSAIDis an alias for NOT EXISTS"
 
 
   I wasn't familiar with that keyword so gave it a try at 
> sparql.org
 and 
>>>  got:
 
   Error 400: Parse error:
   # Names of people who have not stated that they know anyone
   PREFIX foaf: 
   SELECT ?name
   WHERE { ?x foaf:givenName ?name . UNSAID { ?x foaf:knows 
> ?who } }
>>>  Lexical error at line 5, column 11.  Encountered: " " (32), 
> after :
>>>  "UNSAID" Fuseki - version 1.0.0 (Build date: 
> 2013-09-12T10:49:49+0100)
 
   Is this a documentation bug?
>>> 
>>>  Yes. Fixed (in staging).
>>> 
>>>      Andy
>>> 
>>>  It ought to mention MINUS as well.
>>> 
>>> 
 
   Thanks,
   Tim
 
>>> 
>


Re: Reading RDF/JSON into model, there is error when json file has number value

2013-11-01 Thread Andy Seaborne

On 01/11/13 17:11, Qi He wrote:

Hello,

After call model.read("http://dbpedia.org/data/The_Adventures_of_Tom_Sawyer.
json","RDF/JSON");

It show error message: JSON Values given for properties for an Object must
be Strings


It actully gives the line and column number of the error together with a 
stacktrace.



In JSON file, some value are number not string, I can't change
the JSON file. how could I parse number value?


The data does not conform to RDF/JSON:

http://jena.apache.org/documentation/io/rdf-json.html

is a copy of the original Talis description and

https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-json/index.html

is the text that will become the RDF Working Group Note.

Directly using numbers is not allowed. The dbpedia.org response is not 
conformant.


You choices would seem to be:

1/ Read into and string, fix up, then read from that string
2/ Ask for a different format like Turtle
3/ Take the jena sourcecode and change it - the stack trace gives the 
right place to change.


Andy



-Qi





Reading RDF/JSON into model, there is error when json file has number value

2013-11-01 Thread Qi He
Hello,

After call model.read("http://dbpedia.org/data/The_Adventures_of_Tom_Sawyer.
json","RDF/JSON");

It show error message: JSON Values given for properties for an Object must
be Strings

In JSON file, some value are number not string, I can't change
the JSON file. how could I parse number value?

-Qi


Re: UNSAID keyword

2013-11-01 Thread Rob Vesse
NOT IN is valid syntax, see
http://www.w3.org/TR/sparql11-query/#func-not-in

Rob

On 31/10/2013 22:44, "Tim Harsch"  wrote:

>My thoughts as well, and also "NOT IN"?
>
>
>
>
>
>> On Thursday, October 31, 2013 2:54 PM, Andy Seaborne 
>>wrote:
>> > On 31/10/13 21:35, Tim Harsch wrote:
>>>  Hello,
>>> 
>>>  The docs at http://jena.apache.org/documentation/query/negation.html
>>> 
>>>  Say that "UNSAIDis an alias for NOT EXISTS"
>>> 
>>> 
>>>  I wasn't familiar with that keyword so gave it a try at sparql.org
>>>and 
>> got:
>>> 
>>>  Error 400: Parse error:
>>>  # Names of people who have not stated that they know anyone
>>>  PREFIX foaf: 
>>>  SELECT ?name
>>>  WHERE { ?x foaf:givenName ?name . UNSAID { ?x foaf:knows ?who } }
>> Lexical error at line 5, column 11.  Encountered: " " (32), after :
>> "UNSAID" Fuseki - version 1.0.0 (Build date: 2013-09-12T10:49:49+0100)
>>> 
>>>  Is this a documentation bug?
>> 
>> Yes. Fixed (in staging).
>> 
>> Andy
>> 
>> It ought to mention MINUS as well.
>> 
>> 
>>> 
>>>  Thanks,
>>>  Tim
>>> 
>>






AW: Declining TDB load performance with larger files

2013-11-01 Thread Neubert Joachim
I did a comparison of tdbloader vs. tdbloader2. The results are not relieable 
(machine-dependent, and perhaps even influenced by different background load on 
the vm cluster), but perhaps even then they may be interesting to others:

tdbloader w/ 2G heap
4:15 Data phase
4:30 Index phase

tdbloader2 w/ 2G heap
1:30 Data phase
6:30 Index phase

So in sum tdbloader2 shows a slight advantage in my current configuration. 

The reduction of heap space had indeed brought an improvement:

tdbloader w/ 10G heap
4:30 Data phase
5:45 Index phase

Could I expect a larger improvement by adding more memory (for example 
upgrading from 11 to 32 GB)? Are there any experiences for estimating an 
optimal memory size for tdb loading?

Cheers, Joachim

-Ursprüngliche Nachricht-
Von: Andy Seaborne [mailto:a...@apache.org] 
Gesendet: Montag, 28. Oktober 2013 16:58
An: users@jena.apache.org
Betreff: Re: Declining TDB load performance with larger files

Hi Joachim,

What is happing is that the system is running out of working space and the disk 
is being used for real.

 > JAVA_OPTS: -d64 -Xms6g -Xmx10g

Don't set -Xmx10g.  Try a 2G heap.  Don't bother with -Xms.

More heap does not help - in fact, it can make it worse.  TDB uses memory 
mapped files - these are not in Java heap space.  The operating system manages 
how much real RAM is devoted to the virtual address space for the file.  As 
your JVM grows, it is reducing the space for file caching.


There is another effect.  The OS is managing memory but sometimes it 
gets its policy wrong.   Oddly, the faster the initial part of the load, 
the slower the speed drops off to when workspace RAM runs out.  My guess is 
that the OS guesses some acecss style and then code then breaks that 
assumption.  It can even different from run to run on the same machine.

There is also tdbloader2 - it may be faster, it may not.  It is vulnerable to 
OS in different ways.

As it is so per-system specific, try each and see what happens, after fixing 
the heap issue.

Andy


On 28/10/13 12:01, Neubert Joachim wrote:
> I'm loading a 111 million triples file (GND German Authority files).
> For the first roughly 70 million triples, it's really fast (more than
> 60,000 avg), but then throughput declines continuously to a thousand 
> or just some hundred triples (which brings down the avg to less than 
> 7000). During the last part of triples data phase, java is down to 
> 1-2% CPU usage, while disk usage goes up to 100%.
>
> As TDB writes to disk, I'd expect rather linear loading times. The 
> Centos 6 64bit machine (11.5 GB memory) runs on a VMware vSphere 
> cluster, with SAN hardware under-laying. As I observed the same 
> behavior at different times a day, with for sure different load 
> situations, there is no indication that it depended on parallel 
> actions on the cluster.
>
> Perhaps there is something wrong in my config, but I could not figure 
> out what it may be. I add an extract of the log below - it would be 
> great if somebody could help me with hints.
>
> Cheers, Joachim
>

 > ---
 >
 > 2013-10-25 13:33:33 start run
 >
 > Configuration:
 > java version "1.6.0_24"
 > Java(TM) SE Runtime Environment (build 1.6.0_24-b07)  > Java HotSpot(TM) 
 > 64-Bit Server VM (build 19.1-b02, mixed mode)  > JAVA_OPTS: -d64 -Xms6g 
 > -Xmx10g
 > Jena:   VERSION: 2.11.0
 > Jena:   BUILD_DATE: 2013-09-12T10:49:49+0100
 > ARQ:VERSION: 2.11.0
 > ARQ:BUILD_DATE: 2013-09-12T10:49:49+0100
 > RIOT:   VERSION: 2.11.0
 > RIOT:   BUILD_DATE: 2013-09-12T10:49:49+0100
 > TDB:VERSION: 1.0.0
 > TDB:BUILD_DATE: 2013-09-12T10:49:49+0100
 >
 > Use fuseki tdb.tdbloader on file /opt/thes/var/gnd/latest/src/GND.ttl.gz
 > INFO  -- Start triples data phase
 > INFO  ** Load empty triples table
 > INFO  Load: /opt/thes/var/gnd/latest/src/GND.ttl.gz -- 2013/10/25
13:33:35 MESZ
 > INFO  Add: 10.000.000 triples (Batch: 64.766 / Avg: 59.984)
 > INFOElapsed: 166,71 seconds [2013/10/25 13:36:21 MESZ]
 > INFO  Add: 20.000.000 triples (Batch: 71.839 / Avg: 58.653)
 > INFOElapsed: 340,99 seconds [2013/10/25 13:39:16 MESZ]
 > INFO  Add: 30.000.000 triples (Batch: 67.750 / Avg: 60.271)
 > INFOElapsed: 497,75 seconds [2013/10/25 13:41:52 MESZ]
 > INFO  Add: 40.000.000 triples (Batch: 68.212 / Avg: 60.422)
 > INFOElapsed: 662,01 seconds [2013/10/25 13:44:37 MESZ]
 > INFO  Add: 50.000.000 triples (Batch: 54.171 / Avg: 60.645)
 > INFOElapsed: 824,47 seconds [2013/10/25 13:47:19 MESZ]
 > INFO  Add: 60.000.000 triples (Batch: 58.823 / Avg: 60.569)
 > INFOElapsed: 990,60 seconds [2013/10/25 13:50:05 MESZ]
 > INFO  Add: 70.000.000 triples (Batch: 45.495 / Avg: 60.468)
 > INFOElapsed: 1.157,63 seconds [2013/10/25 13:52:52 MESZ]
 > INFO  Add: 80.000.000 triples (Batch: 50.050 / Avg: 57.998)
 > INFOElapsed: 1.379,36 seconds [2013/10/25 13:56:34 MESZ]
 > INFO  Add: 90.000.000 triples (Batch: 13.954 / Avg: 52.447)
 > INFOElapsed: 1.716,02

Re: Should model.read emit addedStatements ?

2013-11-01 Thread Claude Warren
in new_tests the model Contract test (
https://svn.apache.org/repos/asf/jena/Experimental/new-test/src/test/java/com/hp/hpl/jena/rdf/model/ModelContractTests.java)
is:

@Test
public void testRead_InputStream_String_String() throws Exception {
InputStream is = getInputStream("TestReaders.nt");
String lang = "N-TRIPLE";

model.register(SL);
txnBegin(model);
assertSame("read() must return model", model,
model.read(is, "foo:/bar/", lang));
txnCommit(model);

assertTrue(
"Start graph missing",
SL.hasStart(Arrays.asList(new Object[] { 
"someEvent", model,
GraphEvents.startRead })));
assertTrue(
"end graph missing",
SL.hasEnd(Arrays.asList(new Object[] { 
"someEvent", model,
GraphEvents.finishRead })));

// FIXME add tests for converting relative to base.
// assertTrue( "Can not find resolved relative statement",
// model.contains( resource( "foo:/bar/e"), property( 
"foo:/bar/p5")));

is = getInputStream("TestReaders.nt");

txnBegin(model);
model.removeAll();
SL.clear();
assertSame("read() must return model", model,
model.read(is, null, lang));
txnCommit(model);

assertTrue(
"Start graph missing",
SL.hasStart(Arrays.asList(new Object[] { 
"someEvent", model,
GraphEvents.startRead })));
assertTrue(
"end graph missing",
SL.hasEnd(Arrays.asList(new Object[] { 
"someEvent", model,
GraphEvents.finishRead })));

// FIXME add tests for relative .
// Resource s = ResourceFactory.createProperty( null, "e"
// ).asResource();
// Property p = ResourceFactory.createProperty( null, "p5");
// assertTrue( "Can not find relative statement", 
model.contains( s, p
// ));

}


So the contract says the model must emit a GraphEvents.startRead and a
GraphEvents.finishRead.  Other than that the contents of the events
between the start and finish are not defined, so either a single
addedStatements or multiple addedStatement are acceptable.

Claude



On Thu, Oct 31, 2013 at 3:51 PM, Andy Seaborne  wrote:

> On 30/10/13 16:21, Altmann, Michael wrote:
>
>> I am trying to upgrade our application from Jen 2.7.3 to 2.11.0 It
>> appears that in Jena 2.7.3 model.read(stream, null, "RDF/XML") fired
>> a single ModelChangedListener.**addedStatements
>>
>
> IIRC not necessarily a single call - won't it be once per 1000 statements
> or so?  ARP batches updates and send them in clumps to the model.  Each
> clump caused addedStatements(Statement[])
>
> The other parsers have never done this.
>
>
>  whereas
>> model.read(stream, null, "TTL") fired separate addedStatement
>> events.
>>
>> Now in 2.11.0, both types of load emit only
>> ModelChangedListener.**addedStatement events.
>>
>> Was this change intentional?  I don't see anything in the issues.
>>
>
> Not so much intentional but it is a consequence of unifying the handling
> of parsing.
>
> Rather than leave the contract to the particular reader (hence different
> behaviour in 2.7.3 for RDF/XML and TTL, compared to 2.11.0).  And its not
> part of any contract which of the addedStatements calls gets called and how.
>
> There is a nesting GraphEvents.startRead , GraphEvents.finishRead pair
> around each parser run.  That puts the event boundaries on something more
> logical and less implementation sensitive.  (get via the graph level
> handler)
>
> This is something worth sorting out - its the bulkupdate issue in
> disguise.  Putting in logical boundaries for changes looks like the right
> thing to do, and reflect the fact that deletion is not like addition in
> reverse.
>
> (does anything need bulk delete signalling not by pattern?  mixes of
> adds/deletes over and above add/delete notification of each change?).
>
> Andy
>
>
>> Thanks, Michael
>>
>>
>>
>


-- 
I like: Like Like - The likeliest place on the web
LinkedIn: http://www.linkedin.com/in/claudewarren