[jira] [Created] (JENA-1433) Query Builder does not have a short hand method to add group clause with single triple

2017-11-22 Thread Claude Warren (JIRA)
Claude Warren created JENA-1433:
---

 Summary: Query Builder does not have a short hand method to add 
group clause with single triple
 Key: JENA-1433
 URL: https://issues.apache.org/jira/browse/JENA-1433
 Project: Apache Jena
  Issue Type: Improvement
  Components: QueryBuilder
Affects Versions: Jena 3.5.0
Reporter: Claude Warren
Assignee: Claude Warren
Priority: Minor


Currently query builder requires a subquery when adding a graph element.  This 
change is to create a short hand call to insert a graph with a single triple 
path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (JENA-1432) Query Builder does not have a mechanism to put value var block in the where clause

2017-11-22 Thread Claude Warren (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claude Warren updated JENA-1432:

Issue Type: Improvement  (was: Bug)

> Query Builder does not have a mechanism to put value var block in the where 
> clause
> --
>
> Key: JENA-1432
> URL: https://issues.apache.org/jira/browse/JENA-1432
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: QueryBuilder
>Reporter: Claude Warren
>Assignee: Claude Warren
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (JENA-1432) Query Builder does not have a mechanism to put value var block in the where clause

2017-11-22 Thread Claude Warren (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claude Warren updated JENA-1432:

Summary: Query Builder does not have a mechanism to put value var block in 
the where clause  (was: Query Builder does not have a mechanism to put values 
in)

> Query Builder does not have a mechanism to put value var block in the where 
> clause
> --
>
> Key: JENA-1432
> URL: https://issues.apache.org/jira/browse/JENA-1432
> Project: Apache Jena
>  Issue Type: Bug
>  Components: QueryBuilder
>Reporter: Claude Warren
>Assignee: Claude Warren
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (JENA-1432) Query Builder does not have a mechanism to put values in

2017-11-22 Thread Claude Warren (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claude Warren reassigned JENA-1432:
---

   Assignee: Claude Warren
   Priority: Minor  (was: Major)
Component/s: QueryBuilder

Add the ability to add Values to where clause

ValueVar only adds block a  the end of the query (outside the where clause).

This change is to add Value var blocks inside the where clause.


> Query Builder does not have a mechanism to put values in
> 
>
> Key: JENA-1432
> URL: https://issues.apache.org/jira/browse/JENA-1432
> Project: Apache Jena
>  Issue Type: Bug
>  Components: QueryBuilder
>Reporter: Claude Warren
>Assignee: Claude Warren
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (JENA-1432) Query Builder does not have a mechanism to put values in

2017-11-22 Thread Claude Warren (JIRA)
Claude Warren created JENA-1432:
---

 Summary: Query Builder does not have a mechanism to put values in
 Key: JENA-1432
 URL: https://issues.apache.org/jira/browse/JENA-1432
 Project: Apache Jena
  Issue Type: Bug
Reporter: Claude Warren






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


ElementData.equalTo problem?

2017-11-22 Thread Claude Warren
I think equalTo for ElementData is not correct.  Given 2 ElementData
instances

ElementData 1:
{noformat}

VALUES ( ?x ?v ) {
  ( "three"  )
  ( "four"  )
}

{noformat}

ElementData 2:

{noformat}

VALUES ( ?v ?x ) {
  (  "three" )
  (  "four" )
}

{noformat}

shouldn't the equalTo() method return true.

Currently it is sensitive to the ordering of the vars.

I can put a fix in but I want to be sure that there is an error first.

Claude
-- 
I like: Like Like - The likeliest place on the web

LinkedIn: http://www.linkedin.com/in/claudewarren


[jira] [Commented] (JENA-1427) Add nextOrElse() method in ExtendedIterator

2017-11-22 Thread Adam Jacobs (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263362#comment-16263362
 ] 

Adam Jacobs commented on JENA-1427:
---

Sounds great to me. Whether integrating with Java's {{Optional}} or 
implementing equivalent functionality, either way will work.

> Add nextOrElse() method in ExtendedIterator
> ---
>
> Key: JENA-1427
> URL: https://issues.apache.org/jira/browse/JENA-1427
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: Jena 3.5.0
>Reporter: Adam Jacobs
>Priority: Trivial
>  Labels: easytask
>
> Allow a functional approach for returning a default value or throwing a 
> custom exception from a Jena iterator.
> The following method may be added to the ExtendedIterator interface.
> {noformat}
> /**
>  Answer the next object, if it exists, otherwise invoke the 
> _supplier_.
>  */
> public default T nextOrElse( Supplier supplier ) {
> return hasNext() ? next() : supplier.get();
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (JENA-1427) Add nextOrElse() method in ExtendedIterator

2017-11-22 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263306#comment-16263306
 ] 

Andy Seaborne commented on JENA-1427:
-

If we add to {{ExtendedIterator}}, these are the possible operations:

* {{nextOptional}}
* {{nextOrElse}}
* {{nextOrElseGet}}
* {{nextOrElseThrow}}

where the {{nextOrElse*}} are functionally like {{nextOptional().orElse*}} 
methods (all as default methods, the {{nextOrElse*}} not implemented as 
"optional.orElse").

with an open question about whether the {{nextOrElse*}} are really necessary 
(albeit shorter).

This seems like relatively low risk extension to the API. While generally, I am 
nervous about tinkering with the main API because changes/additions are usually 
opinionated, hence implicitly saying "and don't extend ", 
these operations don't fall into this category.

Would that work?


> Add nextOrElse() method in ExtendedIterator
> ---
>
> Key: JENA-1427
> URL: https://issues.apache.org/jira/browse/JENA-1427
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: Jena 3.5.0
>Reporter: Adam Jacobs
>Priority: Trivial
>  Labels: easytask
>
> Allow a functional approach for returning a default value or throwing a 
> custom exception from a Jena iterator.
> The following method may be added to the ExtendedIterator interface.
> {noformat}
> /**
>  Answer the next object, if it exists, otherwise invoke the 
> _supplier_.
>  */
> public default T nextOrElse( Supplier supplier ) {
> return hasNext() ? next() : supplier.get();
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: mapping URIs

2017-11-22 Thread Claude Warren
Another use case is where an endpoint is known to have moved.  So that if
the user attempts to resolve the data they are  correctly redirected to the
new location.

On Wed, Nov 22, 2017 at 5:26 PM, Claude Warren  wrote:

> actually, I want the urn:foo:bar: prefixed URLs in Jena to be converted to
> http://server:8080/ on the way out so that all urn:foo:bar prefixed URNs
> are resolvable by the browser.
>
> I would then add a script at http:server:8080/ to take the path and make a
> call back to fuseki to resolve the item.
>
> However if URNs are mapped on the way out they would need to be mapped on
> the way back in unelss they are simply mapped by using Owl:sameValue
>
> Jena view  (on server:3030)  browser view
>  ->  
>
> Claude
>
> On Wed, Nov 22, 2017 at 5:19 PM, ajs6f  wrote:
>
>> Claude, are you saying you want people to be able to query Fuseki using
>> urn:foo:bar:yeehaw and get back answers using http://server:8080/yeehaw?
>>
>> Otherwise, I'm guessing I'm missing something, but why wouldn't you do
>> the substitutions on the way from the backend to Fuseki?
>>
>> ajs6f
>>
>> > On Nov 22, 2017, at 12:13 PM, Claude Warren  wrote:
>> >
>> > I have a case where data are generated in a backend system that is not
>> > publicly accessible and has no idea where the data are going to be
>> served
>> > from.
>> >
>> > The backend system generates URNs like ""
>> >
>> > What I think I want to do is on the fuseki server be able to configure
>> > "urn:foo:bar" as a place holder for "http://server:8080/yeehaw;.
>> >
>> > Now, I know I can add this as part of an OWL:sameValue but I would like
>> to
>> > see Fuseki do that.
>> >
>> > In this way when the data are hosted on another system the resolution
>> can
>> > be adjusted appropriately.
>> >
>> > Perhaps this does not make sense.  Perhaps there is a way to do this
>> > already.  Perhaps this is a really bad idea.  So I am throwing it out
>> there
>> > to see if there are any comments.
>> >
>> > Thx,
>> > Claude
>> >
>> > --
>> > I like: Like Like - The likeliest place on the web
>> > 
>> > LinkedIn: http://www.linkedin.com/in/claudewarren
>>
>>
>
>
> --
> I like: Like Like - The likeliest place on the web
> 
> LinkedIn: http://www.linkedin.com/in/claudewarren
>



-- 
I like: Like Like - The likeliest place on the web

LinkedIn: http://www.linkedin.com/in/claudewarren


Re: mapping URIs

2017-11-22 Thread Claude Warren
actually, I want the urn:foo:bar: prefixed URLs in Jena to be converted to
http://server:8080/ on the way out so that all urn:foo:bar prefixed URNs
are resolvable by the browser.

I would then add a script at http:server:8080/ to take the path and make a
call back to fuseki to resolve the item.

However if URNs are mapped on the way out they would need to be mapped on
the way back in unelss they are simply mapped by using Owl:sameValue

Jena view  (on server:3030)  browser view
 ->  

Claude

On Wed, Nov 22, 2017 at 5:19 PM, ajs6f  wrote:

> Claude, are you saying you want people to be able to query Fuseki using
> urn:foo:bar:yeehaw and get back answers using http://server:8080/yeehaw?
>
> Otherwise, I'm guessing I'm missing something, but why wouldn't you do the
> substitutions on the way from the backend to Fuseki?
>
> ajs6f
>
> > On Nov 22, 2017, at 12:13 PM, Claude Warren  wrote:
> >
> > I have a case where data are generated in a backend system that is not
> > publicly accessible and has no idea where the data are going to be served
> > from.
> >
> > The backend system generates URNs like ""
> >
> > What I think I want to do is on the fuseki server be able to configure
> > "urn:foo:bar" as a place holder for "http://server:8080/yeehaw;.
> >
> > Now, I know I can add this as part of an OWL:sameValue but I would like
> to
> > see Fuseki do that.
> >
> > In this way when the data are hosted on another system the resolution can
> > be adjusted appropriately.
> >
> > Perhaps this does not make sense.  Perhaps there is a way to do this
> > already.  Perhaps this is a really bad idea.  So I am throwing it out
> there
> > to see if there are any comments.
> >
> > Thx,
> > Claude
> >
> > --
> > I like: Like Like - The likeliest place on the web
> > 
> > LinkedIn: http://www.linkedin.com/in/claudewarren
>
>


-- 
I like: Like Like - The likeliest place on the web

LinkedIn: http://www.linkedin.com/in/claudewarren


Re: mapping URIs

2017-11-22 Thread ajs6f
Claude, are you saying you want people to be able to query Fuseki using 
urn:foo:bar:yeehaw and get back answers using http://server:8080/yeehaw?

Otherwise, I'm guessing I'm missing something, but why wouldn't you do the 
substitutions on the way from the backend to Fuseki?

ajs6f

> On Nov 22, 2017, at 12:13 PM, Claude Warren  wrote:
> 
> I have a case where data are generated in a backend system that is not
> publicly accessible and has no idea where the data are going to be served
> from.
> 
> The backend system generates URNs like ""
> 
> What I think I want to do is on the fuseki server be able to configure
> "urn:foo:bar" as a place holder for "http://server:8080/yeehaw;.
> 
> Now, I know I can add this as part of an OWL:sameValue but I would like to
> see Fuseki do that.
> 
> In this way when the data are hosted on another system the resolution can
> be adjusted appropriately.
> 
> Perhaps this does not make sense.  Perhaps there is a way to do this
> already.  Perhaps this is a really bad idea.  So I am throwing it out there
> to see if there are any comments.
> 
> Thx,
> Claude
> 
> -- 
> I like: Like Like - The likeliest place on the web
> 
> LinkedIn: http://www.linkedin.com/in/claudewarren



mapping URIs

2017-11-22 Thread Claude Warren
I have a case where data are generated in a backend system that is not
publicly accessible and has no idea where the data are going to be served
from.

The backend system generates URNs like ""

What I think I want to do is on the fuseki server be able to configure
"urn:foo:bar" as a place holder for "http://server:8080/yeehaw;.

Now, I know I can add this as part of an OWL:sameValue but I would like to
see Fuseki do that.

In this way when the data are hosted on another system the resolution can
be adjusted appropriately.

Perhaps this does not make sense.  Perhaps there is a way to do this
already.  Perhaps this is a really bad idea.  So I am throwing it out there
to see if there are any comments.

Thx,
Claude

-- 
I like: Like Like - The likeliest place on the web

LinkedIn: http://www.linkedin.com/in/claudewarren


[jira] [Commented] (JENA-1430) Quad loading for in-memory assemblers

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262881#comment-16262881
 ] 

ASF GitHub Bot commented on JENA-1430:
--

Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/314
  
@afs What do you think of that? It's clearer, I think, along the lines [you 
suggested](https://github.com/apache/jena/pull/314#discussion_r152289270).


> Quad loading for in-memory assemblers
> -
>
> Key: JENA-1430
> URL: https://issues.apache.org/jira/browse/JENA-1430
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Reporter: A. Soroka
>Assignee: A. Soroka
>
> In-memory dataset Assemblers should support loading quad files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] jena issue #314: JENA-1430

2017-11-22 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/314
  
@afs What do you think of that? It's clearer, I think, along the lines [you 
suggested](https://github.com/apache/jena/pull/314#discussion_r152289270).


---


Re: CMS diff: Jena Full Text Search

2017-11-22 Thread Chris Tomlinson
Hi Andy and Osma,

I posted JENA-1426  since the 
“improve this page” facility didn’t seem to offer any way to add a commit 
message or more extensive explanation of the reasons for the proposed edits and 
they were somewhat extensive. So raising an issue seemed a way to proceed; 
however, after several days with no comments I thought perhaps I should follow 
the published protocol and I made the update as guest on the CMS.

I had several motivations regarding updating the documentation: 1) I wanted to 
present how the current implementation functions in a way that might be more 
useful to users - for example clarifying what can be expected to work and what 
not in terms of using the native Lucene query language, e.g., JENA-1388 
; 2) identify areas that might 
indicate perhaps unintended aspects of the current implementation; and 3) 
understand the code in preparation for developing a proposal for adding 
jena-text highlighting support 
.

Based on Osma’s feedback I will be opening a few issues on JIRA and making 
corrections to the original submission. I assume that updates should just be 
made as further commits.

Thanks,
Chris



> On Nov 22, 2017, at 6:41 AM, Andy Seaborne  wrote:
> 
> How is this related to JENA-1426?
> 
>Andy
> 
> On 21/11/17 14:48, Osma Suominen wrote:
>> ajs6f kirjoitti 20.11.2017 klo 18:36:
>>> Osma (or anyone else who knows text indexing better than do I, which 
>>> wouldn't take much)-- could you review this? It's got some great useful 
>>> detail about how the indexing works and can be used.
>> Sure, will do.
>> Comments about specific sections below. Generally this is a very good 
>> contribution to the jena-text documentation, which has stagnated a bit.
 +The following illustrates a Lucene document that Jena will create and
 +request Lucene to index:
 +
 +Document<
 +stored, indexed, indexOptions=DOCS 
 
 +indexed, omitNorms, indexOptions=DOCS 
 
 +stored, indexed, tokenized 
 +stored, indexed, omitNorms, indexOptions=DOCS 
 +stored, indexed, tokenized 
 +stored, indexed, omitNorms, indexOptions=DOCS 
 
 +stored, indexed, tokenized 
 +stored, indexed, omitNorms, indexOptions=DOCS 
 +stored, indexed, tokenized 
 +stored, indexed, omitNorms, indexOptions=DOCS 
 
 +>
 +
 +It may be instructive to refer back to this example when considering the 
 various
 +points below.
>> Not sure if this is a perfect illustration. The level of detail is rather 
>> excessive. I know Lucene quite well and I still struggle to understand 
>> what's going on here. Is there another way of presenting this information, 
>> for example just a key-value list that shows the field values that get 
>> stored in the document? I think the field options stored, indexed, 
>> tokenized, omitNorms etc. are unnecessary here or at least should not be so 
>> prominent.
 +The `lang:xx` specification is an optional string, where _xx_ is
 +a BCP-47 language tag. This restricts searches to field values that were 
 originally
 +indexed with the tag _xx_. Searches may be restricted to field values 
 with no
 +language tag via `"lang:none"`. The use of the `lang:xx` is only 
 effective if
 +[multilingual support](#linguistic-support-with-lucene-index) has been 
 configured.
>> The last sentence is not true. You can restrict by language even without 
>> enabling multilingual support, as long as langField has been set.
 +Further, if the `lang:xx` is used then the `property` URI must be supplied
 +in order for searches to work.
>> Not true. The default property should be used if no property was specified.
 +When working with `rdf:langString`s It may be tempting to write:
 +
 +?s text:query "protégé"@fr
 +
 +However, the above will silently fail to return results since the
 +`query string` must be a simple `xsd:string` not an `rdf:langString`.
>> This could be considered a bug - at least it shouldn't fail silently.
 +Even if the default _property_ is `skos:prefLabel` it is necessary
 +to use the above form rather than omitting the `property` argument
 +when restricting the Lucene search to a specific `lang:xx`; otherwise,
 +again there will be no results.
>> Again, not true. I just tested this query against YSO:
>>  ?s text:query ("cat" "lang:en")
>> and it gave a single result, as expected.
 +For a non-default `Field` with no language restriction, the patterns:
 +
 +?s text:query (rdfs:label "protégé")
 +
 +or
 +
 +?s text:query "rdfsLabel:protégé"
 +
 +may be used (see 

Re: CMS diff: Jena Full Text Search

2017-11-22 Thread Andy Seaborne

How is this related to JENA-1426?

Andy

On 21/11/17 14:48, Osma Suominen wrote:

ajs6f kirjoitti 20.11.2017 klo 18:36:
Osma (or anyone else who knows text indexing better than do I, which 
wouldn't take much)-- could you review this? It's got some great 
useful detail about how the indexing works and can be used.


Sure, will do.

Comments about specific sections below. Generally this is a very good 
contribution to the jena-text documentation, which has stagnated a bit.




+The following illustrates a Lucene document that Jena will create and
+request Lucene to index:
+
+    Document<
+    stored, indexed, indexOptions=DOCS 

+    indexed, omitNorms, indexOptions=DOCS 


+    stored, indexed, tokenized 
+    stored, indexed, omitNorms, indexOptions=DOCS 
+    stored, indexed, tokenized 
+    stored, indexed, omitNorms, indexOptions=DOCS 


+    stored, indexed, tokenized 
+    stored, indexed, omitNorms, indexOptions=DOCS 
+    stored, indexed, tokenized 

+    stored, indexed, omitNorms, indexOptions=DOCS 


+    >
+
+It may be instructive to refer back to this example when considering 
the various

+points below.


Not sure if this is a perfect illustration. The level of detail is 
rather excessive. I know Lucene quite well and I still struggle to 
understand what's going on here. Is there another way of presenting this 
information, for example just a key-value list that shows the field 
values that get stored in the document? I think the field options 
stored, indexed, tokenized, omitNorms etc. are unnecessary here or at 
least should not be so prominent.




+The `lang:xx` specification is an optional string, where _xx_ is
+a BCP-47 language tag. This restricts searches to field values that 
were originally
+indexed with the tag _xx_. Searches may be restricted to field 
values with no
+language tag via `"lang:none"`. The use of the `lang:xx` is only 
effective if
+[multilingual support](#linguistic-support-with-lucene-index) has 
been configured.


The last sentence is not true. You can restrict by language even without 
enabling multilingual support, as long as langField has been set.


+Further, if the `lang:xx` is used then the `property` URI must be 
supplied

+in order for searches to work.


Not true. The default property should be used if no property was specified.



+When working with `rdf:langString`s It may be tempting to write:
+
+    ?s text:query "protégé"@fr
+
+However, the above will silently fail to return results since the
+`query string` must be a simple `xsd:string` not an `rdf:langString`.


This could be considered a bug - at least it shouldn't fail silently.


+Even if the default _property_ is `skos:prefLabel` it is necessary
+to use the above form rather than omitting the `property` argument
+when restricting the Lucene search to a specific `lang:xx`; otherwise,
+again there will be no results.


Again, not true. I just tested this query against YSO:
  ?s text:query ("cat" "lang:en")
and it gave a single result, as expected.


+For a non-default `Field` with no language restriction, the patterns:
+
+    ?s text:query (rdfs:label "protégé")
+
+or
+
+    ?s text:query "rdfsLabel:protégé"
+
+may be used (see [below](#entity-map-definition) for how RDF 
_property_ names
+are mapped to Lucene `Field` names). 


I wouldn't recommend using a query form like "rdfsLabel:protégé" in the 
documentation at all. It violates the layered architecture of jena-text 
- the query should not be targeting named fields. If you want to target 
rdfs:label, use the first form.



However, as mentioned earlier,
+
+    ?s text:query ("rdfsLabel:protégé" "lang:fr")
+
+will result in an error owing to the way in which the jena-text 
composes the

+query string to Lucene in the presence of the `"lang:fr"` argument.


Don't do that then. Remove this section. (see previous comment)


+However, it is important to note that the apparently equivalent form:
+
+    (?s ?sc ?lit) text:query "rdfsLabel:protégé"
+
+will fail to produce a binding for `?lit` even though `?s` and `?sc` 
are

+bound as expected.


Again, don't do that. Use (rdfs:label "protégé") instead and let 
jena-text handle the translation from property to Lucene field.


+So if the _literal_ matches are needed you **must use** the query 
arguments that
+list the _property_ explicitly, except in the simple case of a query 
against

+the default `Field`/_property_.


Exactly. And those are the only supported query forms anyway.


+ Queries across multiple `Field`s
+
+It has been mentioned earlier that the text index uses the
+[native Lucene query 
language](http://lucene.apache.org/core/6_4_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description); 

+however, there are important constraints on how the Lucene query 
language is used within jena-text.
+This is owing to the fact that jena-text composes the query string 
that is 

[jira] [Commented] (JENA-1384) Make canonical literals lowercase language tags.

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262293#comment-16262293
 ] 

ASF GitHub Bot commented on JENA-1384:
--

Github user afs commented on a diff in the pull request:

https://github.com/apache/jena/pull/308#discussion_r152531097
  
--- Diff: 
jena-arq/src/main/java/org/apache/jena/riot/process/normalize/CanonicalizeLiteral.java
 ---
@@ -73,6 +76,36 @@ public Node apply(Node node) {
 return n2 ;
 }
 
+/** Convert the lexical form to a canonical form if one of the known 
datatypes,
+ * otherwise return the node argument. (same object :: {@code ==})  
+ */
+public static Node canonicalValue(Node node) {
+if ( ! node.isLiteral() )
+return node ;
+// Fast-track
+if ( NodeUtils.isLangString(node) )
+return node;
+if ( NodeUtils.isSimpleString(node) )
+return node;
+
+if ( ! 
node.getLiteralDatatype().isValid(node.getLiteralLexicalForm()) )
+// Invalid lexical form for the datatype - do nothing.
+return node;
+
+RDFDatatype dt = node.getLiteralDatatype() ;
+// Datatype, not rdf:langString (RDF 1.1). 
+DatatypeHandler handler = dispatch.get(dt) ;
+if ( handler == null )
+return node ;
+Node n2 = handler.handle(node, node.getLiteralLexicalForm(), dt) ;
+if ( n2 == null )
+return node ;
+return n2 ;
+}
+
+/** Convert the language tag of a lexical form to a canonical form if 
one of the known datatypes,
+ * otherwise return the node argument. (same object; compare by {@code 
==})  
+ */
 private static Node canonicalLangtag(String lexicalForm, String 
langTag) {
 String langTag2 = LangTag.canonical(langTag);
 if ( langTag2.equals(langTag) )
--- End diff --

Here, node isn't passed in so it can't be returned. Style thing. Node is 
already known to have a language tag so I don't like passing in a Node which 
can be wrong e.g.through mis-call from somewhere else.. Passing lex+lang forces 
it to be the information for a language tagged literal.

It's tested at line 74
```
if ( n2 == null )
return node ;
```
and elsewhere conversion also sometimes returns `null` for "no conversion" 
which means no new node is needed which is more efficient (meaureably).



> Make canonical literals lowercase language tags.
> 
>
> Key: JENA-1384
> URL: https://issues.apache.org/jira/browse/JENA-1384
> Project: Apache Jena
>  Issue Type: Improvement
>Affects Versions: Jena 3.4.0
>Reporter: Elie Roux
>Assignee: Andy Seaborne
>Priority: Minor
> Fix For: Jena 3.6.0
>
>
> Please make an option so that canonicalLiterals follows the RDF 1.1 
> definition of a canonical literal instead of the BCP-47 one. Right now for my 
> dataset I have:
> - lower-cased value for JSON-LD output (as mandated by the JSON-LD spec 
> following a RDF 1.1 option)
> - BCP-47 canonical value for TTL output if I make Jena canonicalize literals 
> (which I want to, I want them to be uniform)
> - lower-cased value for TTL output if I choose not to canonicalize them
> So please allow for users just to use lower-case uniformly, so that there can 
> be a homogeneous canonicalization among different outputs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] jena pull request #308: JENA-1384: Canonical literals: lexical form and lang...

2017-11-22 Thread afs
Github user afs commented on a diff in the pull request:

https://github.com/apache/jena/pull/308#discussion_r152531097
  
--- Diff: 
jena-arq/src/main/java/org/apache/jena/riot/process/normalize/CanonicalizeLiteral.java
 ---
@@ -73,6 +76,36 @@ public Node apply(Node node) {
 return n2 ;
 }
 
+/** Convert the lexical form to a canonical form if one of the known 
datatypes,
+ * otherwise return the node argument. (same object :: {@code ==})  
+ */
+public static Node canonicalValue(Node node) {
+if ( ! node.isLiteral() )
+return node ;
+// Fast-track
+if ( NodeUtils.isLangString(node) )
+return node;
+if ( NodeUtils.isSimpleString(node) )
+return node;
+
+if ( ! 
node.getLiteralDatatype().isValid(node.getLiteralLexicalForm()) )
+// Invalid lexical form for the datatype - do nothing.
+return node;
+
+RDFDatatype dt = node.getLiteralDatatype() ;
+// Datatype, not rdf:langString (RDF 1.1). 
+DatatypeHandler handler = dispatch.get(dt) ;
+if ( handler == null )
+return node ;
+Node n2 = handler.handle(node, node.getLiteralLexicalForm(), dt) ;
+if ( n2 == null )
+return node ;
+return n2 ;
+}
+
+/** Convert the language tag of a lexical form to a canonical form if 
one of the known datatypes,
+ * otherwise return the node argument. (same object; compare by {@code 
==})  
+ */
 private static Node canonicalLangtag(String lexicalForm, String 
langTag) {
 String langTag2 = LangTag.canonical(langTag);
 if ( langTag2.equals(langTag) )
--- End diff --

Here, node isn't passed in so it can't be returned. Style thing. Node is 
already known to have a language tag so I don't like passing in a Node which 
can be wrong e.g.through mis-call from somewhere else.. Passing lex+lang forces 
it to be the information for a language tagged literal.

It's tested at line 74
```
if ( n2 == null )
return node ;
```
and elsewhere conversion also sometimes returns `null` for "no conversion" 
which means no new node is needed which is more efficient (meaureably).



---


[jira] [Commented] (JENA-1384) Make canonical literals lowercase language tags.

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262248#comment-16262248
 ] 

ASF GitHub Bot commented on JENA-1384:
--

Github user rvesse commented on a diff in the pull request:

https://github.com/apache/jena/pull/308#discussion_r152523288
  
--- Diff: 
jena-arq/src/main/java/org/apache/jena/riot/process/normalize/CanonicalizeLiteral.java
 ---
@@ -73,6 +76,36 @@ public Node apply(Node node) {
 return n2 ;
 }
 
+/** Convert the lexical form to a canonical form if one of the known 
datatypes,
+ * otherwise return the node argument. (same object :: {@code ==})  
+ */
+public static Node canonicalValue(Node node) {
+if ( ! node.isLiteral() )
+return node ;
+// Fast-track
+if ( NodeUtils.isLangString(node) )
+return node;
+if ( NodeUtils.isSimpleString(node) )
+return node;
+
+if ( ! 
node.getLiteralDatatype().isValid(node.getLiteralLexicalForm()) )
+// Invalid lexical form for the datatype - do nothing.
+return node;
+
+RDFDatatype dt = node.getLiteralDatatype() ;
+// Datatype, not rdf:langString (RDF 1.1). 
+DatatypeHandler handler = dispatch.get(dt) ;
+if ( handler == null )
+return node ;
+Node n2 = handler.handle(node, node.getLiteralLexicalForm(), dt) ;
+if ( n2 == null )
+return node ;
+return n2 ;
+}
+
+/** Convert the language tag of a lexical form to a canonical form if 
one of the known datatypes,
+ * otherwise return the node argument. (same object; compare by {@code 
==})  
+ */
 private static Node canonicalLangtag(String lexicalForm, String 
langTag) {
 String langTag2 = LangTag.canonical(langTag);
 if ( langTag2.equals(langTag) )
--- End diff --

Shouldn't we be returning `node` not `null` in the subsequent line?


> Make canonical literals lowercase language tags.
> 
>
> Key: JENA-1384
> URL: https://issues.apache.org/jira/browse/JENA-1384
> Project: Apache Jena
>  Issue Type: Improvement
>Affects Versions: Jena 3.4.0
>Reporter: Elie Roux
>Assignee: Andy Seaborne
>Priority: Minor
> Fix For: Jena 3.6.0
>
>
> Please make an option so that canonicalLiterals follows the RDF 1.1 
> definition of a canonical literal instead of the BCP-47 one. Right now for my 
> dataset I have:
> - lower-cased value for JSON-LD output (as mandated by the JSON-LD spec 
> following a RDF 1.1 option)
> - BCP-47 canonical value for TTL output if I make Jena canonicalize literals 
> (which I want to, I want them to be uniform)
> - lower-cased value for TTL output if I choose not to canonicalize them
> So please allow for users just to use lower-case uniformly, so that there can 
> be a homogeneous canonicalization among different outputs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] jena pull request #308: JENA-1384: Canonical literals: lexical form and lang...

2017-11-22 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/jena/pull/308#discussion_r152523288
  
--- Diff: 
jena-arq/src/main/java/org/apache/jena/riot/process/normalize/CanonicalizeLiteral.java
 ---
@@ -73,6 +76,36 @@ public Node apply(Node node) {
 return n2 ;
 }
 
+/** Convert the lexical form to a canonical form if one of the known 
datatypes,
+ * otherwise return the node argument. (same object :: {@code ==})  
+ */
+public static Node canonicalValue(Node node) {
+if ( ! node.isLiteral() )
+return node ;
+// Fast-track
+if ( NodeUtils.isLangString(node) )
+return node;
+if ( NodeUtils.isSimpleString(node) )
+return node;
+
+if ( ! 
node.getLiteralDatatype().isValid(node.getLiteralLexicalForm()) )
+// Invalid lexical form for the datatype - do nothing.
+return node;
+
+RDFDatatype dt = node.getLiteralDatatype() ;
+// Datatype, not rdf:langString (RDF 1.1). 
+DatatypeHandler handler = dispatch.get(dt) ;
+if ( handler == null )
+return node ;
+Node n2 = handler.handle(node, node.getLiteralLexicalForm(), dt) ;
+if ( n2 == null )
+return node ;
+return n2 ;
+}
+
+/** Convert the language tag of a lexical form to a canonical form if 
one of the known datatypes,
+ * otherwise return the node argument. (same object; compare by {@code 
==})  
+ */
 private static Node canonicalLangtag(String lexicalForm, String 
langTag) {
 String langTag2 = LangTag.canonical(langTag);
 if ( langTag2.equals(langTag) )
--- End diff --

Shouldn't we be returning `node` not `null` in the subsequent line?


---


[GitHub] jena issue #299: Turtle Star

2017-11-22 Thread hartig
Github user hartig commented on the issue:

https://github.com/apache/jena/pull/299
  
Thanks Andy! I agree with what you write.

Do you think there is a chance for such a separate maven module to become 
part of the official family of Apache Jena maven modules?


---