Re: Delete all nested triples

2019-02-20 Thread ganesh chandra
Thanks fo the solution. I was hoping if there was some way we can write
something iterative in the query.

Thanks,
Ganesh

On Wed, Feb 20, 2019 at 9:28 PM Paul Tyson  wrote:

> On Wed, 2019-02-20 at 17:27 -0700, ganesh chandra wrote:
> > Hello All,
> > My data looks something like this:
> >  a something:Entity ;
> > something:privateData [ a something:PrivateData ;
> > something:jsonContent "{\"fileType\": \”jp\"}"^^xsd:string ;
> >   something:modeData [a something:data1
> >   system:code 1234]
> > something:system  ] ;
> >
> > There are many like the above one and I am trying to write the query to
> delete all the data if the id matches. How I should I go about doing this?
> >
>
> If the data is always in this shape, something like this should work:
>
> prefix something: 
> prefix system: 
> DELETE WHERE {
>  a something:Entity ;
> something:privateData ?_a.
> ?_a  a something:PrivateData ;
> something:jsonContent ?json ;
> something:modeData ?_b;
> something:system ?filetype .
> ?_b a something:data1;
> system:code ?code.
> }
>
> This just replaces the blank nodes with sparql variables.
>
> It's a good idea to test DELETE updates thoroughly, because they can
> often cause surprises. One way to see what will be deleted is to change
> the DELETE to SELECT and run it as a query. That will show you exactly
> what triples will be deleted.
>
> Regards,
> --Paul
>
>
> --
Ganesh Chandra S


Re: Delete all nested triples

2019-02-20 Thread Paul Tyson
On Wed, 2019-02-20 at 17:27 -0700, ganesh chandra wrote:
> Hello All,
> My data looks something like this:
>  a something:Entity ;
> something:privateData [ a something:PrivateData ;
> something:jsonContent "{\"fileType\": \”jp\"}"^^xsd:string ;
>   something:modeData [a something:data1
>   system:code 1234]
> something:system  ] ;
> 
> There are many like the above one and I am trying to write the query to 
> delete all the data if the id matches. How I should I go about doing this?
> 

If the data is always in this shape, something like this should work:

prefix something: 
prefix system: 
DELETE WHERE {
 a something:Entity ;
something:privateData ?_a.
?_a  a something:PrivateData ;
something:jsonContent ?json ;
something:modeData ?_b; 
something:system ?filetype .
?_b a something:data1;
system:code ?code.
}

This just replaces the blank nodes with sparql variables.

It's a good idea to test DELETE updates thoroughly, because they can
often cause surprises. One way to see what will be deleted is to change
the DELETE to SELECT and run it as a query. That will show you exactly
what triples will be deleted.

Regards,
--Paul




Delete all nested triples

2019-02-20 Thread ganesh chandra
Hello All,
My data looks something like this:
 a something:Entity ;
something:privateData [ a something:PrivateData ;
something:jsonContent "{\"fileType\": \”jp\"}"^^xsd:string ;
something:modeData [a something:data1
system:code 1234]
something:system  ] ;

There are many like the above one and I am trying to write the query to delete 
all the data if the id matches. How I should I go about doing this?

Regards,
Ganesh

Re: TDB2 with negative float values

2019-02-20 Thread Andy Seaborne

Hi Mike,

> Is this a known issue / limitation of TDB2?  If so, are there any
> suggested workarounds other than always using doubles,
> at least for negatives?

Thanks for the report. Yes, it's a bug in handling negative xsd:float 
RDF terms in TDB2.  (Sign extending an int into a long when it shouldn't.)


xsd:doubles work, they have unrelated code, and take the same amount of 
space.


There's now a ticket:
https://issues.apache.org/jira/browse/JENA-1674

Thanks,
Andy

On 20/02/2019 19:15, Mike Welch wrote:

Hello all,

We recently noticed in a TDB2 dataset that negative float values (including
"negative zero") are corrupted.  The batch loader shows a WARN message --
see below.  The following simple steps reproduce the problem.  This is with
Jena 3.10, though the issue seems to exist prior to that.

$ echo "  \"-1.0\"^^ ." >
test.nt

$ tdb2.tdbloader --loc test_tdb2 test.nt
11:07:36 WARN  NodeId   :: Type set in long: type=Float
value=BF80

$ tdb2.tdbquery --loc test_tdb2 "select ?o (datatype(?o) as ?dt) where { ?s
?p ?o }"
--
| o  | dt|
==
| 3.1861834394748298E-58 |  |
--


TDB1 is OK:

$ tdbloader --loc test test.nt
11:09:12 INFO  loader   :: -- Start triples data phase
11:09:12 INFO  loader   :: ** Load empty triples table
11:09:12 INFO  loader   :: -- Start quads data phase
11:09:12 INFO  loader   :: ** Load empty quads table
11:09:13 INFO  loader   :: Load: test.nt -- 2019/02/19 11:09:13
PST
11:09:13 INFO  loader   :: -- Finish triples data phase
11:09:13 INFO  loader   :: ** Data: 1 triples loaded in 0.11
seconds [Rate: 9.09 per second]
11:09:13 INFO  loader   :: -- Finish quads data phase
11:09:13 INFO  loader   :: -- Start triples index phase
11:09:13 INFO  loader   :: ** Index SPO->POS: 1 slots indexed
11:09:13 INFO  loader   :: ** Index SPO->OSP: 1 slots indexed
11:09:13 INFO  loader   :: -- Finish triples index phase
11:09:13 INFO  loader   :: ** 1 triples indexed in 0.01 seconds
[Rate: 166.67 per second]
11:09:13 INFO  loader   :: -- Finish triples load
11:09:13 INFO  loader   :: ** Completed: 1 triples loaded in
0.12 seconds [Rate: 8.26 per second]
11:09:13 INFO  loader   :: -- Finish quads load

$ tdbquery --loc test "select ?o (datatype(?o) as ?dt) where { ?s ?p ?o }"
---
| o| dt
|
===
| "-1.0"^^ | <
http://www.w3.org/2001/XMLSchema#float> |
---

Creating the dataset via fuseki shows a similar pattern: "in-memory" and
"persistent" correctly retrieve the float value, but "persistent (tdb2)"
returns a corrupt value interpreted as a double.

Is this a known issue / limitation of TDB2?  If so, are there any suggested
workarounds other than always using doubles, at least for negatives?

Thanks,
- Mike

For completeness, this is the result of storing -1.0, -0.0, 0.0, and 1.0 as
both float and double (8 values total).  The combination of float +
negative results in the 2 near-zero "double" values.

---
| o   | dt
   |
===
| "0.0"^^ | <
http://www.w3.org/2001/XMLSchema#float>  |
| "1.0"^^ | <
http://www.w3.org/2001/XMLSchema#float>  |
| 0.0e0   | <
http://www.w3.org/2001/XMLSchema#double> |
| 3.186183062619485E-58   | <
http://www.w3.org/2001/XMLSchema#double> |
| 3.1861834394748298E-58  | <
http://www.w3.org/2001/XMLSchema#double> |
| 1.0e0   | <
http://www.w3.org/2001/XMLSchema#double> |
| -0.0e0  | <
http://www.w3.org/2001/XMLSchema#double> |
| -1.0e0  | <
http://www.w3.org/2001/XMLSchema#double> |
---



TDB2 with negative float values

2019-02-20 Thread Mike Welch
Hello all,

We recently noticed in a TDB2 dataset that negative float values (including
"negative zero") are corrupted.  The batch loader shows a WARN message --
see below.  The following simple steps reproduce the problem.  This is with
Jena 3.10, though the issue seems to exist prior to that.

$ echo "  \"-1.0\"^^ ." >
test.nt

$ tdb2.tdbloader --loc test_tdb2 test.nt
11:07:36 WARN  NodeId   :: Type set in long: type=Float
value=BF80

$ tdb2.tdbquery --loc test_tdb2 "select ?o (datatype(?o) as ?dt) where { ?s
?p ?o }"
--
| o  | dt|
==
| 3.1861834394748298E-58 |  |
--


TDB1 is OK:

$ tdbloader --loc test test.nt
11:09:12 INFO  loader   :: -- Start triples data phase
11:09:12 INFO  loader   :: ** Load empty triples table
11:09:12 INFO  loader   :: -- Start quads data phase
11:09:12 INFO  loader   :: ** Load empty quads table
11:09:13 INFO  loader   :: Load: test.nt -- 2019/02/19 11:09:13
PST
11:09:13 INFO  loader   :: -- Finish triples data phase
11:09:13 INFO  loader   :: ** Data: 1 triples loaded in 0.11
seconds [Rate: 9.09 per second]
11:09:13 INFO  loader   :: -- Finish quads data phase
11:09:13 INFO  loader   :: -- Start triples index phase
11:09:13 INFO  loader   :: ** Index SPO->POS: 1 slots indexed
11:09:13 INFO  loader   :: ** Index SPO->OSP: 1 slots indexed
11:09:13 INFO  loader   :: -- Finish triples index phase
11:09:13 INFO  loader   :: ** 1 triples indexed in 0.01 seconds
[Rate: 166.67 per second]
11:09:13 INFO  loader   :: -- Finish triples load
11:09:13 INFO  loader   :: ** Completed: 1 triples loaded in
0.12 seconds [Rate: 8.26 per second]
11:09:13 INFO  loader   :: -- Finish quads load

$ tdbquery --loc test "select ?o (datatype(?o) as ?dt) where { ?s ?p ?o }"
---
| o| dt
   |
===
| "-1.0"^^ | <
http://www.w3.org/2001/XMLSchema#float> |
---

Creating the dataset via fuseki shows a similar pattern: "in-memory" and
"persistent" correctly retrieve the float value, but "persistent (tdb2)"
returns a corrupt value interpreted as a double.

Is this a known issue / limitation of TDB2?  If so, are there any suggested
workarounds other than always using doubles, at least for negatives?

Thanks,
- Mike

For completeness, this is the result of storing -1.0, -0.0, 0.0, and 1.0 as
both float and double (8 values total).  The combination of float +
negative results in the 2 near-zero "double" values.

---
| o   | dt
  |
===
| "0.0"^^ | <
http://www.w3.org/2001/XMLSchema#float>  |
| "1.0"^^ | <
http://www.w3.org/2001/XMLSchema#float>  |
| 0.0e0   | <
http://www.w3.org/2001/XMLSchema#double> |
| 3.186183062619485E-58   | <
http://www.w3.org/2001/XMLSchema#double> |
| 3.1861834394748298E-58  | <
http://www.w3.org/2001/XMLSchema#double> |
| 1.0e0   | <
http://www.w3.org/2001/XMLSchema#double> |
| -0.0e0  | <
http://www.w3.org/2001/XMLSchema#double> |
| -1.0e0  | <
http://www.w3.org/2001/XMLSchema#double> |
---


Re: Using content with meta on text index

2019-02-20 Thread Mikael Pesonen



Not sure. Reading Jena text documentation, it states that external 
document contents can be added into Jena text index.


Just not sure how this should be done in practice. How to handle 
concurrency, and how exactly add documents so that we could make sparql 
queries that target content and metadata same time, preferably with some 
weights.


But it's fine for us to use single Lucene index for all data.

Br


On 19.2.2019 17.44, ajs6f wrote:

Are you asking how to use an extant Lucene index with your text documents in it 
for Jena's text index as well?

ajs6f


On Feb 14, 2019, at 6:23 AM, Mikael Pesonen  wrote:


Hi,

Our system stores documents with separate rest API and document id's are 
stored, along with document metadata, to Jena db. we would like to make text 
queries that target both the document contents and meta data.

Is there a recommended/supported way to make this happen on Jena and Lucene?

--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND