Re: [CMS PATCH] for documentation/tools/schemagen-maven

2015-02-26 Thread Rob Vesse
We typically avoid putting specific version numbers in the documentation
because it quickly becomes out of date.

Usually we just put x.y.z as the version and refer people to Maven central
to find the latest version

Rob

On 26/02/2015 00:28, "Stian Soiland-Reyes"  wrote:

>This should fix https://issues.apache.org/jira/browse/JENA-849
>
>The examples still needs information about adding the
>target/generated-sources to the compile path (as the plugin doesn't
>yet - JENA-731 JENA-732 are dormant since June :-/
>
>On 26 February 2015 at 00:20, Stian Soiland-Reyes 
>wrote:
>> Clone URL (Committers only):
>> 
>>https://cms.apache.org/redirect?new=stain;action=diff;uri=http://jena.apa
>>che.org/documentation%2Ftools%2Fschemagen-maven.mdtext
>>
>> .. now also with  update links.
>>
>> Index: trunk/content/documentation/tools/schemagen-maven.mdtext
>> ===
>> --- trunk/content/documentation/tools/schemagen-maven.mdtext
>>(revision 1655891)
>> +++ trunk/content/documentation/tools/schemagen-maven.mdtext
>>(working copy)
>> @@ -10,15 +10,15 @@
>>  constants from the ontology.
>>
>>  For some projects, invoking `schemagen` from the command line, perhaps
>>via `ant`,
>> -is sufficient. For projects organised around Apache maven, it would be
>>convenient to integrate
>> -the schemagen translation step into maven's normal build process. This
>>plugin
>> +is sufficient. For projects organised around Apache Maven, it would be
>>convenient to integrate
>> +the schemagen translation step into Maven's normal build process. This
>>plugin
>>  provides a means to do just that.
>>
>>  ## Pre-requisites
>>
>> -This plugin adds a step to the maven build process to automatically
>>translate RDFS
>> +This plugin adds a step to the Maven build process to automatically
>>translate RDFS
>>  and OWL files, encoded as RDF/XML, Turtle or N-triples into Java
>>source files.
>> -This plugin is designed to be used with a Java project that is already
>>using Apache maven to
>> +This plugin is designed to be used with a Java project that is already
>>using Apache Maven to
>>  control the build. Non-Java projects do not need this tool. Projects
>>that are
>>  not using Maven should see the [schemagen
>>documentation](schemagen.html)
>>  for ways to run `schemagen` from the command line.
>> @@ -29,13 +29,34 @@
>>  Schemagen is available from the maven central repository. To use it,
>>add
>>  the following dependency to your `pom.xml`:
>>
>> -
>> -  org.apache.jena.tools
>> -  schemagen
>> -  0.2-SNAPSHOT
>> -  maven-plugin
>> -
>> +
>> +  
>> +
>> +  org.apache.jena
>> +  jena-maven-tools
>> +  0.7
>> +  
>> +
>> +  schemagen
>> +  
>> +translate
>> +  
>> +
>> +  
>> +
>> +  
>> +
>> +
>> +   
>> + org.apache.jena
>> + jena-core
>> + 2.12.1
>> +  
>> +
>>
>> +Replace the `` tags above with the latest versions as found by
>> +browsing 
>>[jena-maven-tools](http://central.maven.org/maven2/org/apache/jena/jena-m
>>aven-tools/)
>> +and 
>>[jena-core](http://central.maven.org/maven2/org/apache/jena/jena-core/)
>>in Maven Central.
>>
>>
>>  ## Configuration: basic principles
>> @@ -52,15 +73,15 @@
>>* a mechanism to specify common options for all input files
>>* a mechanism to specify per-file unique options
>>
>> -In maven, all such configuration information is provided via the
>>`pom.xml` file. We tell
>> -maven to use the plugin via the `/` section:
>> +In Maven, all such configuration information is provided via the
>>`pom.xml` file. We tell
>> +Maven to use the plugin via the ` ` section:
>>
>>  
>>
>>  
>> -  org.openjena.tools
>> -  schemagen
>> -  0.2-SNAPSHOT
>> +  org.apache.jena
>> +  jena-maven-tools
>> +  0.7
>>
>>
>>
>> @@ -75,6 +96,11 @@
>>
>>  
>>
>> +*Replace the `` tags above with the latest versions as found
>>by
>> +browsing 
>>[jena-maven-tools](http://central.maven.org/maven2/org/apache/jena/jena-m
>>aven-tools/)
>> + in Maven Central.*
>> +
>> +
>>  The configuration options all nest inside the ``
>>section.
>>
>>  ### Specifying files to process
>> @@ -93,7 +119,7 @@
>>
>>  Options are, in general, given in the `` section. A given
>>  `` refers to one input source - one file - as named by the
>> -` name. The actual option names are taken from the RDF [config
>> +`` name. The actual option names are taken from the RDF [config
>>  file property 
>>names](http://jena.apache.org/documentation/tools/schemagen.html),
>>  omitting the namespace:
>>
>> @@ -115,38 +141,50 @@
>>
>>  ## Example configuration
>>
>> +*Note: Replace the `` tags below with the latest versions as
>>found by
>> +browsing 
>>[jena-maven-tools](http://central.maven.org

Re: Fuseki 2 packages

2015-02-26 Thread Andy Seaborne

On 25/02/15 21:45, Reto Gmür wrote:

On Sat, Feb 14, 2015 at 2:08 PM, Andy Seaborne  wrote:


On 14/02/15 11:44, Claude Warren wrote:


I was looking to download some components (specifically Security and
Fuseki
2.x).  I found them in the SNAPSHOT repository (as expected) but the
https://repository.apache.org/content/repositories/
snapshots/org/apache/jena/jena-fuseki/2.0.0-SNAPSHOT/
directory only contains POMs and not packages as the
https://repository.apache.org/content/repositories/
snapshots/org/apache/jena/jena-fuseki/1.1.2-SNAPSHOT/
does.  Is this intentional?



Yes.

jena-fuseki is the root of the Fuseki hierarchy.  Fuseki2 is multi-module
and multi-delivery for binaries.

Things are in jena-fuseki-*, e.g. jena-fuseki-war, and the download is
apache-jena-fuseki.

(Fusek2 has not been releases yet.)

  On a slightly tangential topic:


Many sites provide download links to the current release as well as the
current snapshot.  We don't seem to do this.  Is there a reason why not?



At Apache, we do not mix releases (VOTEd on, formally the responsibility
of the Foundation, with source-release and more archived forever via
/dist/, PGP-signed) and development snapshots (not voted on, technically
the responsibility of the person who caused it to be built, not archived by
Apache, not signed).

The downloads page must not lead to development builds.



Actually http://jena.apache.org/documentation/serving_data/ does link to
the snapshot repository in its download section.


Thanks for pointing that out.

(That's not the projects download page but I agree it would be better if 
it linked to the full description on snapshots and development and 
general caveats and warnings)


Andy



Cheers,
Reto





Re: Fedora Linux Package

2015-02-26 Thread Andy Seaborne

On 24/02/15 18:26, Don Pellegrino wrote:

I am attempting to add an RPM package for Apache Jena to Fedora Linux.
The RPM .spec file and details on the packaging process can be found
in Red Hat Bugzilla Bug 1193730
(https://bugzilla.redhat.com/show_bug.cgi?id=1193730).


Hi there - it's very interesting to see this happening!

I don't have a lot of depth (or indeed any !) in how packaging is done 
so please bear me for some simple questions:


Are you building from source, or taking the binaries and repackaging them?



During packaging, I ended up changing the line endings for a few files:

# Address "wrong-file-end-of-line-encoding" warnings from rpmlint

dos2unix javadoc-arq/META-INF/MANIFEST.MF
dos2unix javadoc-core/META-INF/MANIFEST.MF
dos2unix javadoc-sdb/META-INF/MANIFEST.MF
dos2unix javadoc-tdb/META-INF/MANIFEST.MF


Those files are generated generated automatically.  I don't know why 
they are the line ending they have - the last release wasn't on windows 
so maybe the javadoc tool does it (that's a guess - no evidence).



dos2unix src-examples/data/eswc-2006-09-21.rdf
dos2unix src-examples/data/test1.owl
dos2unix src-examples/tdb/examples/ExTDB1.java
dos2unix src-examples/tdb/examples/ExTDB2.java
dos2unix src-examples/tdb/examples/ExTDB3.java
dos2unix src-examples/tdb/examples/ExTDB4.java
dos2unix src-examples/tdb/examples/ExTDB5.java


src-examples - not critical to running.

I don't know of any particular reason to have MSWindows line ends 
except, maybe, they are more likely to be viewed on Windows.  That said, 
nowadays it rarely matters.  IDEs "do the right thing".  Only old text 
editors have problems and they have other issues anyway.



I also used sed to update the scripts in /bin so that they used
/etc/jena-log4j.properties and found the .jar files in the
system-level jar directory:


The scripts will use their own copy of jena-log4j.properties - it's 
built into jena-core.


All the ARQ related scripts have a built-in logging configuration anyway 
- they do look around for a local copy of log4j.properties.



# Modify the scripts to use the sysconfdir jena-log4j.properties file
# instead of relative paths.

find bin -type f -exec sed -i
's|LOGGING="${LOGGING:--Dlog4j.configuration=file:$JENA_HOME/jena-log4j.properties}"|LOGGING="${LOGGING:--Dlog4j.configuration=file:%{_sysconfdir}/jena-log4j.properties}"|g'
{} \;

# Modify the scripts to use the javadir for JAR dependencies instead
# of relative paths.

find bin -type f -exec sed -i 's|-cp "$JENA_CP"|-cp
"%{_javadir}/%{name}/*"|g' {} \;

Let me know if you have any interest in the Fedora package or feedback
on the packaging.


It's really helpful to have this being done and outside the core 
committers. There are simply too many setups and variations to handle.


That is one of the reasons why Apache releases source, not just binaries 
(other reasons being provenance and verifiability).



It's quite likely the scripts should be changed.  They are written for a 
downloaded "apache-jena" setup.


They are all made by "template.bin" and running "cmd-maker" in 
jena/apache-jena so you can make changes in one place and reform the 
scripts if you wish to do it that way.


Andy



[GitHub] jena pull request: JENA-865: Include prefixes in example query

2015-02-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/34


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (JENA-865) Fuseki 2: Example query does not declare owl/rdfs PREFIXes

2015-02-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338381#comment-14338381
 ] 

ASF subversion and git services commented on JENA-865:
--

Commit b4e303ab286ffde9dbee65e9c9a504b5852c0776 in jena's branch 
refs/heads/master from [~soilandreyes]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=b4e303a ]

JENA-865: Include prefixes in example query


> Fuseki 2: Example query does not declare owl/rdfs PREFIXes
> --
>
> Key: JENA-865
> URL: https://issues.apache.org/jira/browse/JENA-865
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Fuseki
>Affects Versions: Fuseki 2.0.0
> Environment: https://registry.hub.docker.com/u/stain/jena-fuseki/
>Reporter: Stian Soiland-Reyes
>Priority: Trivial
> Fix For: Fuseki 2.0.0
>
>
> If I create a new dataset through the web interface, have not uploaded any 
> data yet, and under Dataset/Query click the Example Query "
> Selection of Classes" I get the error:
> {code}
> Error 400: Parse error: 
> SELECT distinct ?class ?label ?description
> where {
>   ?class a owl:Class.
>   OPTIONAL { ?class rdfs:label ?label}
>   OPTIONAL { ?class rdfs:comment ?description}
> }
> LIMIT 25
> Line 5, column 12: Unresolved prefixed name: owl:Class
> Fuseki - version 2.0.0-SNAPSHOT (Build date: 2015-01-25T09:54:26+)
> {code}
>  
> The fix is to either add to the example query:
> {code}
> PREFIX owl: 
> PREFIX rdfs: 
> {code}
> or - much better (or in addition) - always interpret prefixes for
> owl, rdfs, rdf, xsd
> (any others?)
> Ideally 'distinct' and 'where' should also be upper-case in the example query 
> :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-865) Fuseki 2: Example query does not declare owl/rdfs PREFIXes

2015-02-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338382#comment-14338382
 ] 

ASF GitHub Bot commented on JENA-865:
-

Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/34


> Fuseki 2: Example query does not declare owl/rdfs PREFIXes
> --
>
> Key: JENA-865
> URL: https://issues.apache.org/jira/browse/JENA-865
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Fuseki
>Affects Versions: Fuseki 2.0.0
> Environment: https://registry.hub.docker.com/u/stain/jena-fuseki/
>Reporter: Stian Soiland-Reyes
>Priority: Trivial
> Fix For: Fuseki 2.0.0
>
>
> If I create a new dataset through the web interface, have not uploaded any 
> data yet, and under Dataset/Query click the Example Query "
> Selection of Classes" I get the error:
> {code}
> Error 400: Parse error: 
> SELECT distinct ?class ?label ?description
> where {
>   ?class a owl:Class.
>   OPTIONAL { ?class rdfs:label ?label}
>   OPTIONAL { ?class rdfs:comment ?description}
> }
> LIMIT 25
> Line 5, column 12: Unresolved prefixed name: owl:Class
> Fuseki - version 2.0.0-SNAPSHOT (Build date: 2015-01-25T09:54:26+)
> {code}
>  
> The fix is to either add to the example query:
> {code}
> PREFIX owl: 
> PREFIX rdfs: 
> {code}
> or - much better (or in addition) - always interpret prefixes for
> owl, rdfs, rdf, xsd
> (any others?)
> Ideally 'distinct' and 'where' should also be upper-case in the example query 
> :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [CMS PATCH] for documentation/tools/schemagen-maven

2015-02-26 Thread Stian Soiland-Reyes
x.y.z sounds good to me. Each mention of  in that patch already
has a corresponding paragraph about where to find the latest version.
On 26 Feb 2015 10:03, "Rob Vesse"  wrote:

> We typically avoid putting specific version numbers in the documentation
> because it quickly becomes out of date.
>
> Usually we just put x.y.z as the version and refer people to Maven central
> to find the latest version
>
> Rob
>
> On 26/02/2015 00:28, "Stian Soiland-Reyes"  wrote:
>
> >This should fix https://issues.apache.org/jira/browse/JENA-849
> >
> >The examples still needs information about adding the
> >target/generated-sources to the compile path (as the plugin doesn't
> >yet - JENA-731 JENA-732 are dormant since June :-/
> >
> >On 26 February 2015 at 00:20, Stian Soiland-Reyes 
> >wrote:
> >> Clone URL (Committers only):
> >>
> >>
> https://cms.apache.org/redirect?new=stain;action=diff;uri=http://jena.apa
> >>che.org/documentation%2Ftools%2Fschemagen-maven.mdtext
> >>
> >> .. now also with  update links.
> >>
> >> Index: trunk/content/documentation/tools/schemagen-maven.mdtext
> >> ===
> >> --- trunk/content/documentation/tools/schemagen-maven.mdtext
> >>(revision 1655891)
> >> +++ trunk/content/documentation/tools/schemagen-maven.mdtext
> >>(working copy)
> >> @@ -10,15 +10,15 @@
> >>  constants from the ontology.
> >>
> >>  For some projects, invoking `schemagen` from the command line, perhaps
> >>via `ant`,
> >> -is sufficient. For projects organised around Apache maven, it would be
> >>convenient to integrate
> >> -the schemagen translation step into maven's normal build process. This
> >>plugin
> >> +is sufficient. For projects organised around Apache Maven, it would be
> >>convenient to integrate
> >> +the schemagen translation step into Maven's normal build process. This
> >>plugin
> >>  provides a means to do just that.
> >>
> >>  ## Pre-requisites
> >>
> >> -This plugin adds a step to the maven build process to automatically
> >>translate RDFS
> >> +This plugin adds a step to the Maven build process to automatically
> >>translate RDFS
> >>  and OWL files, encoded as RDF/XML, Turtle or N-triples into Java
> >>source files.
> >> -This plugin is designed to be used with a Java project that is already
> >>using Apache maven to
> >> +This plugin is designed to be used with a Java project that is already
> >>using Apache Maven to
> >>  control the build. Non-Java projects do not need this tool. Projects
> >>that are
> >>  not using Maven should see the [schemagen
> >>documentation](schemagen.html)
> >>  for ways to run `schemagen` from the command line.
> >> @@ -29,13 +29,34 @@
> >>  Schemagen is available from the maven central repository. To use it,
> >>add
> >>  the following dependency to your `pom.xml`:
> >>
> >> -
> >> -  org.apache.jena.tools
> >> -  schemagen
> >> -  0.2-SNAPSHOT
> >> -  maven-plugin
> >> -
> >> +
> >> +  
> >> +
> >> +  org.apache.jena
> >> +  jena-maven-tools
> >> +  0.7
> >> +  
> >> +
> >> +  schemagen
> >> +  
> >> +translate
> >> +  
> >> +
> >> +  
> >> +
> >> +  
> >> +
> >> +
> >> +   
> >> + org.apache.jena
> >> + jena-core
> >> + 2.12.1
> >> +  
> >> +
> >>
> >> +Replace the `` tags above with the latest versions as found by
> >> +browsing
> >>[jena-maven-tools](
> http://central.maven.org/maven2/org/apache/jena/jena-m
> >>aven-tools/)
> >> +and
> >>[jena-core](http://central.maven.org/maven2/org/apache/jena/jena-core/)
> >>in Maven Central.
> >>
> >>
> >>  ## Configuration: basic principles
> >> @@ -52,15 +73,15 @@
> >>* a mechanism to specify common options for all input files
> >>* a mechanism to specify per-file unique options
> >>
> >> -In maven, all such configuration information is provided via the
> >>`pom.xml` file. We tell
> >> -maven to use the plugin via the `/` section:
> >> +In Maven, all such configuration information is provided via the
> >>`pom.xml` file. We tell
> >> +Maven to use the plugin via the ` ` section:
> >>
> >>  
> >>
> >>  
> >> -  org.openjena.tools
> >> -  schemagen
> >> -  0.2-SNAPSHOT
> >> +  org.apache.jena
> >> +  jena-maven-tools
> >> +  0.7
> >>
> >>
> >>
> >> @@ -75,6 +96,11 @@
> >>
> >>  
> >>
> >> +*Replace the `` tags above with the latest versions as found
> >>by
> >> +browsing
> >>[jena-maven-tools](
> http://central.maven.org/maven2/org/apache/jena/jena-m
> >>aven-tools/)
> >> + in Maven Central.*
> >> +
> >> +
> >>  The configuration options all nest inside the ``
> >>section.
> >>
> >>  ### Specifying files to process
> >> @@ -93,7 +119,7 @@
> >>
> >>  Options are, in general, given in the `` section. A given
> >>  `` refers to one input sou

[CMS PATCH] documentation/tools/schemagen-maven.mdtext

2015-02-26 Thread Stian Soiland-Reyes
Clone URL (Committers only):
https://cms.apache.org/redirect?new=stain;action=diff;uri=http://jena.apache.org/documentation%2Ftools%2Fschemagen-maven.mdtext

now with x.y.z

-- 
st...@apache.org

Index: trunk/content/documentation/tools/schemagen-maven.mdtext
===
--- trunk/content/documentation/tools/schemagen-maven.mdtext(revision 
1655891)
+++ trunk/content/documentation/tools/schemagen-maven.mdtext(working copy)
@@ -10,15 +10,15 @@
 constants from the ontology.
 
 For some projects, invoking `schemagen` from the command line, perhaps via 
`ant`,
-is sufficient. For projects organised around Apache maven, it would be 
convenient to integrate
-the schemagen translation step into maven's normal build process. This plugin
+is sufficient. For projects organised around Apache Maven, it would be 
convenient to integrate
+the schemagen translation step into Maven's normal build process. This plugin
 provides a means to do just that.
 
 ## Pre-requisites
 
-This plugin adds a step to the maven build process to automatically translate 
RDFS
+This plugin adds a step to the Maven build process to automatically translate 
RDFS
 and OWL files, encoded as RDF/XML, Turtle or N-triples into Java source files.
-This plugin is designed to be used with a Java project that is already using 
Apache maven to
+This plugin is designed to be used with a Java project that is already using 
Apache Maven to
 control the build. Non-Java projects do not need this tool. Projects that are
 not using Maven should see the [schemagen documentation](schemagen.html)
 for ways to run `schemagen` from the command line.
@@ -29,13 +29,34 @@
 Schemagen is available from the maven central repository. To use it, add
 the following dependency to your `pom.xml`:
 
-
-  org.apache.jena.tools
-  schemagen
-  0.2-SNAPSHOT
-  maven-plugin
-
+
+  
+
+  org.apache.jena
+  jena-maven-tools
+  x.y.z
+  
+
+  schemagen
+  
+translate
+  
+
+  
+
+  
+
+
+   
+ org.apache.jena
+ jena-core
+ x.y.z
+  
+
 
+Replace the `x.y.z` tags above with the latest versions as found by
+browsing 
[jena-maven-tools](http://central.maven.org/maven2/org/apache/jena/jena-maven-tools/)
 
+and [jena-core](http://central.maven.org/maven2/org/apache/jena/jena-core/) in 
Maven Central.
 
 
 ## Configuration: basic principles
@@ -52,15 +73,15 @@
   * a mechanism to specify common options for all input files
   * a mechanism to specify per-file unique options
 
-In maven, all such configuration information is provided via the `pom.xml` 
file. We tell
-maven to use the plugin via the `/` section:
+In Maven, all such configuration information is provided via the `pom.xml` 
file. We tell
+Maven to use the plugin via the ` ` section:
 
 
   
 
-  org.openjena.tools
-  schemagen
-  0.2-SNAPSHOT
+  org.apache.jena
+  jena-maven-tools
+  x.y.z
   
   
   
@@ -75,6 +96,11 @@
   
 
 
+*Replace the `x.y` tags above with the latest versions as found by
+browsing 
[jena-maven-tools](http://central.maven.org/maven2/org/apache/jena/jena-maven-tools/)
 
+ in Maven Central.*
+
+
 The configuration options all nest inside the `` section.
 
 ### Specifying files to process
@@ -93,7 +119,7 @@
 
 Options are, in general, given in the `` section. A given
 `` refers to one input source - one file - as named by the
-` name. The actual option names are taken from the RDF [config
+`` name. The actual option names are taken from the RDF [config
 file property 
names](http://jena.apache.org/documentation/tools/schemagen.html),
 omitting the namespace:
 
@@ -115,38 +141,50 @@
 
 ## Example configuration
 
+*Note: Replace the `x.y.z` tags below with the latest versions as 
found by
+browsing 
[jena-maven-tools](http://central.maven.org/maven2/org/apache/jena/jena-maven-tools/)
 
+and [jena-core](http://central.maven.org/maven2/org/apache/jena/jena-core/) in 
Maven Central.*
+
+
 
-  
-
-  org.openjena.tools
-  schemagen
-  0.2-SNAPSHOT
-  
-
-  src/main/vocabs/*.ttl
-  src/main/vocabs/foaf.rdf
-
-
-  
-default
-org.example.test
-  
-  
-
-src/main/vocabs/demo2.ttl
-true
-  
-
-  
-  
-
-  schemagen
-  
-translate
-  
-
-  
-
-  
-
+ 
+  
+org.apache.jena
+jena-maven-tools
+x.y.z
+
+  
+src/main/vocabs/*.ttl
+src

Re: Release planning : 2.13.0

2015-02-26 Thread Stephen Allen
On Wed, Feb 25, 2015 at 6:56 AM, Chris Dollin 
wrote:

> On 02/25/2015 11:30 AM, Andy Seaborne wrote:
>
>> Final call for Jena 2.13.0.

>>>
> Stephen wrote:
>
>  I finished up and commited some outstanding changes I had for jena-text.
>>> I
>>> added the ability to specify an analyzer for the query text itself that
>>> was
>>> different than the one used for the document.  I also added some
>>> documentation explaining it on the site.
>>>
>>
>> Is there a JIRA for these changes?  I have only a superficial
>> understanding here
>> but is any of this  related to JENA-686?
>>
>> Stephen+Chris : maybe some discussion of plans and intentions on the dev@
>> list?
>>
>
> Sure. I have some notes about what the 686 changes are about I can
> transcribe. I have been making the (originally small) changes for
> 686 compatible with master and have (rightly or wrongly) been delaying
> discussion until I had something that seemed to be sound.
>
> Right Now I'm merging in the latest master changes and am expecting to
> make a pull request this PM.
>
> I'm guessing that it's unlikely the changes will be reviewed in time
> to make it into 2.13.0?
>
>
The query analyzer change is pretty separate from JENA-686, it just exposes
a capability that Lucene already has.  This is useful for example if you
are using the StandardAnalyzer to tokenize the stored document, but perhaps
you want to use one that tokenizes the query string differently.  You
already could do this with jena-text's Solr implementation, since the
configuration for that is controlled via the Solr config file.

The conjunctive query idea of Chris' is also something I would look forward
to.  It actually looks like I may have implemented a feature that Chris
needed, the ability to specify a custom TextDocProducer.  Chris: I would be
interested to see your approach for this.  Are you planning on waiting
until all statements have been inserted then querying the RDF store to
regenerate the documents for subjects that have been changed?  How do you
handle triple deletion?

I implemented the custom TextDocProducer for a slightly different reason,
which was to handle triple deletions and remove the document from the
lucene index.  However, my triple deletion code is kind of a hack (I am
only currently indexing rdfs:label, and my application enforces a
cardinality of 1 for that property, so I can just delete all documents with
a given subject and predicate).  The index does not actually keep the value
of the document, it only indexes it, so this solution would not work in the
general case.  I would propose in the future that we actual store and not
just index the document so that it can be appropriately identified and
deleted.  This would require a change to existing Lucene databases (we
should provide a tool to reindex existing data).  An alternative to
actually storing the value would be to generate a hash of the
subject+predicate+object and store that as an identifier.

Chris, I see in the JIRA that you talk about committing work to a branch,
but I can't seem to locate it.  Is this in github somewhere?

-Stephen


Storing values in Lucene index Re: Release planning : 2.13.0

2015-02-26 Thread Osma Suominen

On 26/02/15 18:37, Stephen Allen wrote:

I would propose in the future that we actual store and not
just index the document so that it can be appropriately identified and
deleted.  This would require a change to existing Lucene databases (we
should provide a tool to reindex existing data).  An alternative to
actually storing the value would be to generate a hash of the
subject+predicate+object and store that as an identifier.


I second storing the original value in the Lucene index at least as an 
option - it would obviously increase the index size, though I suspect 
the increase would be rather minor if you compare it to the overall (TDB 
+ text index) database size. This would be similar to how LARQ used to 
work, though LARQ only provides access to the values, not the subject 
resources.


It would allow, with some additional code, having access to the actual 
value from the SPARQL query. Something like this:


(?s ?value) text:query 'word' .

Then you could also easily check that the triple actually exists in 
current RDF data (and in the current graph), with a pattern such as this:


?s rdfs:label ?value .


For me, it would probably allow some optimization of queries that 
currently have to do a bit of detective work to find out which value 
actually matched the query. I'm currently doing queries somewhat like this:


?s text:query (skos:altLabel 'word*') .
?s skos:altLabel ?value .
FILTER (STRSTARTS(?value, 'word'))

This is inefficient if there happen to be lots of skos:altLabel values, 
as there are in e.g. AGROVOC thesaurus data.


-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi