Difference Between ElementWalker and OpWalker | ARQ

2018-05-14 Thread anuj kumar
Hey guys,
 Can some one please explain me the difference between ElementWalker and
OpWalker classes of ARQ and when to use which one.

I am trying to modify an incoming SPARQL query such that I want to strip
the FILTER portion out of the query. Here is an example incoming query:

> "SELECT ?s\n" +
> "WHERE {\n" +
> "?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
> <http://www.example.com/Ind#Indicator> .\n" +
> "?s <http://www.example.com/common#modified> ?modified .\n" +
> "FILTER (?modified >= 
> \"2018-04-01T00:00:00.00\"^^<http://www.w3.org/2001/XMLSchema#dateTime>)\n"
>  +
> "} limit 100"
>
> And I want to convert it to:



"SELECT ?s\n" +
"WHERE {\n" +
"?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.example.com/Ind#Indicator> .\n" +
"} limit 100"

Basically the idea is to do filtering at the Data Store level rather than
at the application level.

If I understand it correctly then I can use either of the classes to
achieve what I want to do, but I dont know if there is some sort of benefit
to use one over the other.

Thanks,
-- 
*Anuj Kumar*


ARQ Sparql Algebra Extension

2017-11-29 Thread anuj kumar
Hi,
 So I am working on a performance issue with our Triple Store (which is
based on HBase)
To give a background, the query I am executing looks like:

SELECT ?s
> WHERE {
> ?s a file:File .
> ?s ex:modified ?modified .
> FILTER(?modified >="2017-11-05T00:00:00.0"^^<http://
> www.w3.org/2001/XMLSchema#dateTime>)
> }


Looking at the ARQ Execution plan, it is like this:

(slice 0 1000
> (project (?s)
>   (filter (>= ?modified "2017-1105T00:00:00.0"^^<http://
> www.w3.org/2001/XMLSchema#dateTime>)
> (bgp
>   (triple ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <
> http://www.example.com/File#File>)
>   (triple ?s <http://www.example.com/common#modified> ?modified)
> 


AND I have around 45000 File Objects in my Triple Store.

As you can see from the above execution plan, I first get the Subject ID
for these 45000 File objects and then I fire a query per File Id to get the
odified date for the same. This clearly is not performant.

My Questions:

1. Is there a better way to create a SELECT query to have a good execution
plan.
2. If not, then can I somehow change the generation of execution plan?
3. Is it advisable to re-write the ARQ Execution Plan to suite our need and
how complicated this might be.

Thanks and please let me know if you need more information.

Thanks,
Anuj Kumar
-- 
*Anuj Kumar*


Re: How to derive Change Statements

2017-10-30 Thread anuj kumar
Thanks for the Clarification Claude and asf. I get what you are suggesting.
As asf said, i believe the approach using DataSetChanges and GraphListener
is a bit closer to what I want to achieve.

Thanks,
Anuj

On Mon, Oct 30, 2017 at 11:15 AM, <aj...@apache.org> wrote:

> Claude's approach isn't Fuseki-specific. You can use the jena-permissions
> module directly, and that might even be easier. But certainly,
> GraphListener and DatasetChanges are probably closer already to what you
> want.
>
>
> ajs6f
>
> anuj kumar wrote on 10/30/17 5:52 AM:
>
> Hey Claude,
>>  I am not using Fuseki and thus the solution you propose will not be a
>> feasible one for me.
>>
>> Andy,
>>  Thanks for the information on GraphListener, DatasetChanges as well as
>> rdf-patch. I think using these tools I will e able to handle my use cases.
>> Let me give them a try and see if I stumble upon some rabbit hole.
>>
>> Thanks,
>> Anuj Kumar
>>
>> On Fri, Oct 27, 2017 at 2:39 PM, Claude Warren <cla...@xenei.com> wrote:
>>
>> Since you need to detect who changed what the only way I can see to do
>>> this
>>> is turn on authentication on Fuseki and track changes made through it.
>>>
>>> You could bastardise the permissions layer[1] to do what you want.  The
>>> permissions layer will let you filter down to the actions on the triples,
>>> rather than implementing a SecurityEvaluator to perform the restriction
>>> you
>>> could implement it record all changes (including who made them) in any
>>> storage and format you wish.
>>>
>>> 1. https://jena.apache.org/documentation/permissions/index.html
>>>
>>>
>>> On Fri, Oct 27, 2017 at 11:42 AM, anuj kumar <anuj.gandh...@gmail.com>
>>> wrote:
>>>
>>> Hi Jena Users,
>>>>  I have a query regarding the most effective way to capture changes in
>>>>
>>> the
>>>
>>>> underlying Triple Store.
>>>> I have a requirement where:
>>>> 1. Every time a property of a Node (represented as a Triple Statement)
>>>> changes, I also need to generate certain change statements to capture
>>>>
>>> what
>>>
>>>> has changed, who changed it, when it was changed etc.
>>>> 2. If I delete a Node (represented as a Set of Triples in the RDF
>>>>
>>> Store), I
>>>
>>>> need to capture the action DELETE on this node, who deleted the node,
>>>>
>>> when
>>>
>>>> it was deleted etc.
>>>>
>>>> Basically, I need to have a audit trail developed so that I  can create
>>>>
>>> the
>>>
>>>> graph as it was at a given moment in time.
>>>>
>>>> The question is:
>>>> 1. What is the best way to implement such functionality? Does Jena
>>>>
>>> support
>>>
>>>> such a thing either natively or through some standard mechanism?
>>>>
>>>> Thanks,
>>>> --
>>>> *Anuj Kumar*
>>>>
>>>>
>>>
>>>
>>> --
>>> I like: Like Like - The likeliest place on the web
>>> <http://like-like.xenei.com>
>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>>
>>>
>>
>>
>>


-- 
*Anuj Kumar*


Re: How to derive Change Statements

2017-10-30 Thread anuj kumar
Hey Claude,
 I am not using Fuseki and thus the solution you propose will not be a
feasible one for me.

Andy,
 Thanks for the information on GraphListener, DatasetChanges as well as
rdf-patch. I think using these tools I will e able to handle my use cases.
Let me give them a try and see if I stumble upon some rabbit hole.

Thanks,
Anuj Kumar

On Fri, Oct 27, 2017 at 2:39 PM, Claude Warren <cla...@xenei.com> wrote:

> Since you need to detect who changed what the only way I can see to do this
> is turn on authentication on Fuseki and track changes made through it.
>
> You could bastardise the permissions layer[1] to do what you want.  The
> permissions layer will let you filter down to the actions on the triples,
> rather than implementing a SecurityEvaluator to perform the restriction you
> could implement it record all changes (including who made them) in any
> storage and format you wish.
>
> 1. https://jena.apache.org/documentation/permissions/index.html
>
>
> On Fri, Oct 27, 2017 at 11:42 AM, anuj kumar <anuj.gandh...@gmail.com>
> wrote:
>
> > Hi Jena Users,
> >  I have a query regarding the most effective way to capture changes in
> the
> > underlying Triple Store.
> > I have a requirement where:
> > 1. Every time a property of a Node (represented as a Triple Statement)
> > changes, I also need to generate certain change statements to capture
> what
> > has changed, who changed it, when it was changed etc.
> > 2. If I delete a Node (represented as a Set of Triples in the RDF
> Store), I
> > need to capture the action DELETE on this node, who deleted the node,
> when
> > it was deleted etc.
> >
> > Basically, I need to have a audit trail developed so that I  can create
> the
> > graph as it was at a given moment in time.
> >
> > The question is:
> > 1. What is the best way to implement such functionality? Does Jena
> support
> > such a thing either natively or through some standard mechanism?
> >
> > Thanks,
> > --
> > *Anuj Kumar*
> >
>
>
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
>



-- 
*Anuj Kumar*


How to derive Change Statements

2017-10-27 Thread anuj kumar
Hi Jena Users,
 I have a query regarding the most effective way to capture changes in the
underlying Triple Store.
I have a requirement where:
1. Every time a property of a Node (represented as a Triple Statement)
changes, I also need to generate certain change statements to capture what
has changed, who changed it, when it was changed etc.
2. If I delete a Node (represented as a Set of Triples in the RDF Store), I
need to capture the action DELETE on this node, who deleted the node, when
it was deleted etc.

Basically, I need to have a audit trail developed so that I  can create the
graph as it was at a given moment in time.

The question is:
1. What is the best way to implement such functionality? Does Jena support
such a thing either natively or through some standard mechanism?

Thanks,
-- 
*Anuj Kumar*


Re: [VOTE] Release vote : Apache Jena 3.3.0

2017-05-03 Thread anuj kumar
+1

Waiting eagerly for the release :)

Thanks,
Anuj Kumar

On Wed, May 3, 2017 at 12:14 PM, Bruno P. Kinoshita <
brunodepau...@yahoo.com.br.invalid> wrote:

>
>
> [ X ] +1 Approve the release
>
> Build passing on Linux (Ubuntu LTS), and
>
> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
> 2015-11-11T05:41:47+13:00)
> Maven home: /opt/maven
> Java version: 1.8.0_131, vendor: Oracle Corporation
> Java home: /usr/lib/jvm/java-8-oracle/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "4.4.0-75-generic", arch: "amd64", family:
> "unix"
>
> Thanks!!!
> Bruno
> 
> From: "aj...@apache.org" <aj...@apache.org>
> To: d...@jena.apache.org; users@jena.apache.org
> Sent: Wednesday, 3 May 2017 7:17 AM
> Subject: [VOTE] Release vote : Apache Jena 3.3.0
>
>
>
> Hi Jena-folks,
>
>
> Here is a vote on a release of Jena 3.3.0 (with Fuseki 2.6.0).
>
>
> The most obvious changes in this release are some large shifts in support
> for full-text indexing. See below for details.
>
>
> This is the first proposed candidate for this release.
>
>
> == Dependency changes:
>
>
> jena-text: Lucene v6.4.1
>
>
> New module:
>
>  jena-text-es
>
>  Elasticsearch: v5.2.2 License : AL2
>
>   https://github.com/elastic/elasticsearch/blob/master/LICENSE.txt
>
>  [*] Full list of recursive dependencies at the end
>
>
> == Updates:
>
>  Guava (shaded) to version 21.
>
>
> == Key features of the release:
>
>
> * Drop text indexing support for solr4j, add support for Elastic Search
>
>  JENA-1301 - Drop Solr
>
>  JENA-1305 - Elastic Search support
>
>  Accomplished thanks to contributions from Anuj Kumar!
>
>
> * Add VCARD4 (JENA-1205)
>
> Contribution from Bart Hanssens
>
>
> * RDFParser (JENA-1306)
>
>New low level, detailed setup of parsers for special cases.
>
>RDFDataMgr is still the main way to read data (it now uses RDFParser).
>
>
> * RDFWriter (JENA-1323)
>
>
> * Bad URIs in RDF/XML are now warnings (model.read) inline with RDF 1.1.
>
>(JENA-1324)
>
>
>
>~39 other JIRA tickets
>
>See https://s.apache.org/jena-3.3.0-jira
>
>
> == Release
>
>
> Everyone, not just committers, is invited to test and vote.
>
> Please download and test the proposed release.
>
>
> Staging repository:
>
>   https://repository.apache.org/content/repositories/orgapachejena-1017
>
>
> Proposed dist/ area:
>
>   https://dist.apache.org/repos/dist/dev/jena/
>
>
> Keys:
>
>   https://svn.apache.org/repos/asf/jena/dist/KEYS
>
>
> Git commit (browser URL):
>
>   http://git-wip-us.apache.org/repos/asf/jena/commit/a35bad97
>
>
> Git Commit Hash:
>
>a35bad974eb24e4ed03af195c7adef72039cd030
>
>
> Git Commit Tag:
>
>jena-3.3.0-rc1
>
>
> Please vote to approve this release:
>
>
>[ ] +1 Approve the release
>
>[ ]  0 Don't care
>
>[ ] -1 Don't release, because ...
>
>
> This vote will be open to at least
>
>
>17:00 UTC-04:00 (US Eastern 5PM) on Friday 5 May 2017
>
>
> If you expect to check the release but the 72 hour limit does not work
>
> for you, please email within the schedule above with an expected time
>
> and we can extend the vote period.
>
>
> Thanks,
>
>
>ajs6f
>
>
> Checking needed:
>
>
> + does everything work on Linux?
>
> + does everything work on MS Windows?
>
> + does everything work on OS X?
>
> + are the GPG signatures fine?
>
> + are the checksums correct?
>
> + is there a source archive?
>
> + can the source archive really be built?
>
> + is there a correct LICENSE and NOTICE file in each artifact
>
>  (both source and binary artifacts)?
>
> + does the NOTICE file contain all necessary attributions?
>
> + have any licenses of dependencies changed due to upgrades?
>
>   if so have LICENSE and NOTICE been upgraded appropriately?
>
> + does the tag/commit in the SCM contain reproducible sources?
>
>
>
> --
>
>
> [*]
>
> org.elasticsearch dependencies (recursive) other than org.apache artifacts:
>
>
> org.elasticsearch:elasticsearch:jar:5.2.2
>
> net.sf.jopt-simple:jopt-simple:jar:5.0.2
>
> com.carrotsearch:hppc:jar:0.7.1
>
> joda-time:joda-time:jar:2.9.5
>
> org.yaml:snakeyaml:jar:1.15
>
> com.tdunning:t-digest:jar:3.0
>
> org.hdrhistogram:HdrHistogram:jar
>
> net.java.dev.jna:jna:jar:4.2.2
>
> io.netty:netty:jar:3.10.6.Final
>
> com.github.spullara.mustache.java:compiler:jar:0.9.3
>



-- 
*Anuj Kumar*


Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-03-03 Thread anuj kumar
Hey,
 I just saw https://issues.apache.org/jira/browse/JENA-1301
Should we not first officially deprecate it and gives any users of Solr a
chance to move to different Indexing technology?

BTW, I dont know yet how to login to apache JIRA.

Thanks,
Anuj Kumar

On Fri, Mar 3, 2017 at 1:23 PM, anuj kumar <anuj.gandh...@gmail.com> wrote:

> I Osma,
>  I briefly looked at the pull request. I beieve we need to upgrade Lucene
> and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene
> 4.9.1
>
> Also how do i log into  issues.apache.org and where to file this bug?
>
> Thanks,
> Anuj Kumar
>
> On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <osma.suomi...@helsinki.fi>
> wrote:
>
>> Hi Anuj,
>>
>> It's great that we found agreement over this!
>>
>> I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and
>> made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4
>> as an intermediate step). I'll wait for comments on the PR and if people
>> think it's OK I will merge it soon to Jena master. Meanwhile, you can
>> already base your ES implementation on that branch [2] if you like.
>>
>> Could you please open a JIRA issue on issues.apache.org explaining the
>> Elasticsearch support feature, so that we have a place for tracking this
>> work, request comments etc.
>>
>> Also I suggest we move the discussion around this to the developers' list
>> (d...@jena.apache.org) where it's more appropriate.
>>
>> -Osma
>>
>> [1] https://github.com/apache/jena/pull/219
>>
>> [2] https://github.com/osma/jena/tree/jena-1250-lucene6
>>
>>
>> 03.03.2017, 02:45, anuj kumar kirjoitti:
>>
>>> I second that. I am now finalising the integration of ES and should have
>>> a
>>> good production quality implementation ready in a week's time.  At that
>>> time I would want you guys to have a look at the implementation and
>>> provide
>>> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
>>> code in jena-text module and do a round of testing.
>>>
>>> Thanks,
>>> Anuj Kumar
>>>
>>> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:
>>>
>>> I do agree that trying to juggle different versions of Lucene libraries
>>>> is
>>>> probably not a realistic option right now. Luckily (if I understand the
>>>> conversation thus far correctly) we have a solid alternative; getting
>>>> our
>>>> current Lucene dependency upgraded should allow us to (eventually) merge
>>>> Anuj's work into the mainstream of development. Someone please tell me
>>>> if I
>>>> have that wrong! :grin:
>>>>
>>>> Let me reiterate that this seems like very good work and speaking for
>>>> myself, I certainly want to get it included into Jena. It's just a
>>>> question
>>>> of fitting it in correctly, which might take a bit of time.
>>>>
>>>> ---
>>>> A. Soroka
>>>> The University of Virginia Library
>>>>
>>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <osma.suomi...@helsinki.fi>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi Anuj!
>>>>>
>>>>> I have nothing against modularity in general. However, I cannot see how
>>>>>
>>>> your proposal could work in practice for the Fuseki build, due to the
>>>> reasons I mentioned in my previous message (and Adam seemed to concur).
>>>>
>>>>>
>>>>> In any case, I'll see what I can do to get the Lucene upgrade moving
>>>>>
>>>> again. If all current Jena modules (ie jena-text and jena-spatial) were
>>>> upgraded to Lucene 6.4.1, then you could just add your ES classes to
>>>> jena-text, right? I think that would be better for everyone than having
>>>> to
>>>> maintain your own separate module.
>>>>
>>>>>
>>>>> -Osma
>>>>>
>>>>> 01.03.2017, 16:59, anuj kumar kirjoitti:
>>>>>
>>>>>> I personally have no preference as to how the code in Jena should be
>>>>>> structured, as long as I am able to use it :).
>>>>>> I have personal preference of doing it in a specific way because IMO,
>>>>>>
>>>>> it is
>>>>
>>>>> modular which makes it much easier to maintain in th

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-03-03 Thread anuj kumar
I Osma,
 I briefly looked at the pull request. I beieve we need to upgrade Lucene
and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene
4.9.1

Also how do i log into  issues.apache.org and where to file this bug?

Thanks,
Anuj Kumar

On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <osma.suomi...@helsinki.fi>
wrote:

> Hi Anuj,
>
> It's great that we found agreement over this!
>
> I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and
> made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4
> as an intermediate step). I'll wait for comments on the PR and if people
> think it's OK I will merge it soon to Jena master. Meanwhile, you can
> already base your ES implementation on that branch [2] if you like.
>
> Could you please open a JIRA issue on issues.apache.org explaining the
> Elasticsearch support feature, so that we have a place for tracking this
> work, request comments etc.
>
> Also I suggest we move the discussion around this to the developers' list (
> d...@jena.apache.org) where it's more appropriate.
>
> -Osma
>
> [1] https://github.com/apache/jena/pull/219
>
> [2] https://github.com/osma/jena/tree/jena-1250-lucene6
>
>
> 03.03.2017, 02:45, anuj kumar kirjoitti:
>
>> I second that. I am now finalising the integration of ES and should have a
>> good production quality implementation ready in a week's time.  At that
>> time I would want you guys to have a look at the implementation and
>> provide
>> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
>> code in jena-text module and do a round of testing.
>>
>> Thanks,
>> Anuj Kumar
>>
>> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:
>>
>> I do agree that trying to juggle different versions of Lucene libraries is
>>> probably not a realistic option right now. Luckily (if I understand the
>>> conversation thus far correctly) we have a solid alternative; getting our
>>> current Lucene dependency upgraded should allow us to (eventually) merge
>>> Anuj's work into the mainstream of development. Someone please tell me
>>> if I
>>> have that wrong! :grin:
>>>
>>> Let me reiterate that this seems like very good work and speaking for
>>> myself, I certainly want to get it included into Jena. It's just a
>>> question
>>> of fitting it in correctly, which might take a bit of time.
>>>
>>> ---
>>> A. Soroka
>>> The University of Virginia Library
>>>
>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <osma.suomi...@helsinki.fi>
>>>>
>>> wrote:
>>>
>>>>
>>>> Hi Anuj!
>>>>
>>>> I have nothing against modularity in general. However, I cannot see how
>>>>
>>> your proposal could work in practice for the Fuseki build, due to the
>>> reasons I mentioned in my previous message (and Adam seemed to concur).
>>>
>>>>
>>>> In any case, I'll see what I can do to get the Lucene upgrade moving
>>>>
>>> again. If all current Jena modules (ie jena-text and jena-spatial) were
>>> upgraded to Lucene 6.4.1, then you could just add your ES classes to
>>> jena-text, right? I think that would be better for everyone than having
>>> to
>>> maintain your own separate module.
>>>
>>>>
>>>> -Osma
>>>>
>>>> 01.03.2017, 16:59, anuj kumar kirjoitti:
>>>>
>>>>> I personally have no preference as to how the code in Jena should be
>>>>> structured, as long as I am able to use it :).
>>>>> I have personal preference of doing it in a specific way because IMO,
>>>>>
>>>> it is
>>>
>>>> modular which makes it much easier to maintain in the long run. But
>>>>>
>>>> again
>>>
>>>> it may not be the quickest one.
>>>>>
>>>>> I already have been given a deadline, by the company to have ES
>>>>>
>>>> extension
>>>
>>>> implemented in the next 15 days :). What this means is that I will be
>>>>> maintaining the ES code extension to Jena Text at-least locally for a
>>>>> coming period of time. I would be more than happy to contribute to Jena
>>>>> community whatever is required to have a proper ElasticSearch
>>>>> Implementation in place, whether within jena-text module or as a
>>>>>
>>>> separate
>>>
>>>&

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-03-02 Thread anuj kumar
I second that. I am now finalising the integration of ES and should have a
good production quality implementation ready in a week's time.  At that
time I would want you guys to have a look at the implementation and provide
feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
code in jena-text module and do a round of testing.

Thanks,
Anuj Kumar

On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:

> I do agree that trying to juggle different versions of Lucene libraries is
> probably not a realistic option right now. Luckily (if I understand the
> conversation thus far correctly) we have a solid alternative; getting our
> current Lucene dependency upgraded should allow us to (eventually) merge
> Anuj's work into the mainstream of development. Someone please tell me if I
> have that wrong! :grin:
>
> Let me reiterate that this seems like very good work and speaking for
> myself, I certainly want to get it included into Jena. It's just a question
> of fitting it in correctly, which might take a bit of time.
>
> ---
> A. Soroka
> The University of Virginia Library
>
> > On Mar 1, 2017, at 1:27 PM, Osma Suominen <osma.suomi...@helsinki.fi>
> wrote:
> >
> > Hi Anuj!
> >
> > I have nothing against modularity in general. However, I cannot see how
> your proposal could work in practice for the Fuseki build, due to the
> reasons I mentioned in my previous message (and Adam seemed to concur).
> >
> > In any case, I'll see what I can do to get the Lucene upgrade moving
> again. If all current Jena modules (ie jena-text and jena-spatial) were
> upgraded to Lucene 6.4.1, then you could just add your ES classes to
> jena-text, right? I think that would be better for everyone than having to
> maintain your own separate module.
> >
> > -Osma
> >
> > 01.03.2017, 16:59, anuj kumar kirjoitti:
> >> I personally have no preference as to how the code in Jena should be
> >> structured, as long as I am able to use it :).
> >> I have personal preference of doing it in a specific way because IMO,
> it is
> >> modular which makes it much easier to maintain in the long run. But
> again
> >> it may not be the quickest one.
> >>
> >> I already have been given a deadline, by the company to have ES
> extension
> >> implemented in the next 15 days :). What this means is that I will be
> >> maintaining the ES code extension to Jena Text at-least locally for a
> >> coming period of time. I would be more than happy to contribute to Jena
> >> community whatever is required to have a proper ElasticSearch
> >> Implementation in place, whether within jena-text module or as a
> separate
> >> module. Till the time Lucene and Solr is not upgraded to the latest
> >> version, I will have to maintain a separate module for jena-text-es.
> >>
> >> Cheers!
> >> Anuj Kumar
> >>
> >>
> >> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
> >>
> >>> Osma--
> >>>
> >>> The short answer is that yes, given the right tools you _can_ have
> >>> different versions of code accessible in different ways. The longer
> answer
> >>> is that it's probably not a viable alternative for Jena for this
> problem,
> >>> at least not without a lot of other change.
> >>>
> >>> You are right to point to the classloader mechanism as being at the
> heart
> >>> of this question, but I must alter your remark just slightly. From "the
> >>> Java classloader only sees a single, flat package/class namespace and
> a set
> >>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
> >>> flat package/class namespace and a set of compiled classes".
> >>>
> >>> This is the fact that OSGi uses to make it possible to maintain strict
> >>> module boundaries (and even dynamic module relationships at run-time).
> Each
> >>> OSGi bundle sees its own classloader, and the framework is responsible
> for
> >>> connecting bundles up to ensure that every bundle has what it needs in
> the
> >>> way of types to function, based on metadata that the bundles provide
> to the
> >>> framework. It's an incredibly powerful system (I use it every day and
> enjoy
> >>> it enormously) but it's also very "heavy" and requires a good deal of
> >>> investment to use. In particular, it's probably too large to put
> _inside_
> >>> Jena. (I frequently put Jena inside an OSGi instanc

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-03-02 Thread anuj kumar
Just FYI, I was able to index multiple fields in ElasticSearch using Jena
Text capability.
The issue was in my ElasticSearch code where I was doing insert every time
instead of an update :/

Cheers!
Anuj Kumar

On Wed, Mar 1, 2017 at 7:40 PM, anuj kumar <anuj.gandh...@gmail.com> wrote:

> Thanks Osma. I sent my previous email just a minute early. I will try your
> suggestion and if it doesn't work will send you the entire example.
>
> Thanks again.
> Anuj
>
> On 1 Mar 2017 19:36, "Osma Suominen" <osma.suomi...@helsinki.fi> wrote:
>
>> Hi Anuj!
>>
>> Generally I use assembler descriptions to configure the jena-text index.
>> An example with multiple properties (SKOS label properties) is here:
>> https://github.com/NatLibFi/Skosmos/wiki/InstallTutorial#cre
>> ating-a-text-index
>>
>> For examples on how to use assembler descriptions from Java code, take a
>> look at the jena-text unit tests. They generally contain a snippet of
>> assembler definition that configures the text index in a particular way,
>> then test that it does what it should when using that configuration.
>>
>> You didn't provide a full example. What is your data and what query did
>> you use? What results did you expect? What happened instead?
>>
>> One possible problem in your configuration is that you have set the
>> primary predicate to rdfs:label, but not set a field for it. Try adding
>> this:
>>
>> entDef.set("label", RDFS.label.asNode());
>>
>> For querying everything else but the default field, you need to specify
>> the predicate at query time. With your configuration, it should be possible
>> to query rdfs:comment values like this:
>>
>> ?s text:query (rdfs:comment "word") .
>>
>> Hope this helps!
>>
>> -Osma
>>
>> 01.03.2017, 17:33, anuj kumar kirjoitti:
>>
>>> BTW, I have one more question:
>>>
>>> How do I add more than one field to be indexed in my Index?
>>> Basically, if I want to index rdfs:label , rdfs:comment in the same index
>>> document, how do I do it?
>>>
>>> I tried :
>>>
>>> EntityDefinition entDef = new EntityDefinition(DOC_TYPE,
>>> FIELD_TO_SEARCH);
>>> entDef.setPrimaryPredicate(RDFS.label);
>>> entDef.setGraphField(GRAPH_FIELD_NAME);
>>> entDef.set("comment", RDFS.comment.asNode());
>>>
>>> But it doesnt work. Can you please point me on a way to do it please.
>>> This
>>> is an important piece of functionality I need.
>>>
>>> Thanks,
>>> Anuj Kumar
>>>
>>>
>>> On Wed, Mar 1, 2017 at 3:59 PM, anuj kumar <anuj.gandh...@gmail.com>
>>> wrote:
>>>
>>> I personally have no preference as to how the code in Jena should be
>>>> structured, as long as I am able to use it :).
>>>> I have personal preference of doing it in a specific way because IMO, it
>>>> is modular which makes it much easier to maintain in the long run. But
>>>> again it may not be the quickest one.
>>>>
>>>> I already have been given a deadline, by the company to have ES
>>>> extension
>>>> implemented in the next 15 days :). What this means is that I will be
>>>> maintaining the ES code extension to Jena Text at-least locally for a
>>>> coming period of time. I would be more than happy to contribute to Jena
>>>> community whatever is required to have a proper ElasticSearch
>>>> Implementation in place, whether within jena-text module or as a
>>>> separate
>>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>>> version, I will have to maintain a separate module for jena-text-es.
>>>>
>>>> Cheers!
>>>> Anuj Kumar
>>>>
>>>>
>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>>>
>>>> Osma--
>>>>>
>>>>> The short answer is that yes, given the right tools you _can_ have
>>>>> different versions of code accessible in different ways. The longer
>>>>> answer
>>>>> is that it's probably not a viable alternative for Jena for this
>>>>> problem,
>>>>> at least not without a lot of other change.
>>>>>
>>>>> You are right to point to the classloader mechanism as being at the
>>>>> heart
>>>>> of this question, but I must alter your remark just slightly. From "the

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-03-01 Thread anuj kumar
Thanks Osma. I sent my previous email just a minute early. I will try your
suggestion and if it doesn't work will send you the entire example.

Thanks again.
Anuj

On 1 Mar 2017 19:36, "Osma Suominen" <osma.suomi...@helsinki.fi> wrote:

> Hi Anuj!
>
> Generally I use assembler descriptions to configure the jena-text index.
> An example with multiple properties (SKOS label properties) is here:
> https://github.com/NatLibFi/Skosmos/wiki/InstallTutorial#cre
> ating-a-text-index
>
> For examples on how to use assembler descriptions from Java code, take a
> look at the jena-text unit tests. They generally contain a snippet of
> assembler definition that configures the text index in a particular way,
> then test that it does what it should when using that configuration.
>
> You didn't provide a full example. What is your data and what query did
> you use? What results did you expect? What happened instead?
>
> One possible problem in your configuration is that you have set the
> primary predicate to rdfs:label, but not set a field for it. Try adding
> this:
>
> entDef.set("label", RDFS.label.asNode());
>
> For querying everything else but the default field, you need to specify
> the predicate at query time. With your configuration, it should be possible
> to query rdfs:comment values like this:
>
> ?s text:query (rdfs:comment "word") .
>
> Hope this helps!
>
> -Osma
>
> 01.03.2017, 17:33, anuj kumar kirjoitti:
>
>> BTW, I have one more question:
>>
>> How do I add more than one field to be indexed in my Index?
>> Basically, if I want to index rdfs:label , rdfs:comment in the same index
>> document, how do I do it?
>>
>> I tried :
>>
>> EntityDefinition entDef = new EntityDefinition(DOC_TYPE, FIELD_TO_SEARCH);
>> entDef.setPrimaryPredicate(RDFS.label);
>> entDef.setGraphField(GRAPH_FIELD_NAME);
>> entDef.set("comment", RDFS.comment.asNode());
>>
>> But it doesnt work. Can you please point me on a way to do it please. This
>> is an important piece of functionality I need.
>>
>> Thanks,
>> Anuj Kumar
>>
>>
>> On Wed, Mar 1, 2017 at 3:59 PM, anuj kumar <anuj.gandh...@gmail.com>
>> wrote:
>>
>> I personally have no preference as to how the code in Jena should be
>>> structured, as long as I am able to use it :).
>>> I have personal preference of doing it in a specific way because IMO, it
>>> is modular which makes it much easier to maintain in the long run. But
>>> again it may not be the quickest one.
>>>
>>> I already have been given a deadline, by the company to have ES extension
>>> implemented in the next 15 days :). What this means is that I will be
>>> maintaining the ES code extension to Jena Text at-least locally for a
>>> coming period of time. I would be more than happy to contribute to Jena
>>> community whatever is required to have a proper ElasticSearch
>>> Implementation in place, whether within jena-text module or as a separate
>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>> version, I will have to maintain a separate module for jena-text-es.
>>>
>>> Cheers!
>>> Anuj Kumar
>>>
>>>
>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>>
>>> Osma--
>>>>
>>>> The short answer is that yes, given the right tools you _can_ have
>>>> different versions of code accessible in different ways. The longer
>>>> answer
>>>> is that it's probably not a viable alternative for Jena for this
>>>> problem,
>>>> at least not without a lot of other change.
>>>>
>>>> You are right to point to the classloader mechanism as being at the
>>>> heart
>>>> of this question, but I must alter your remark just slightly. From "the
>>>> Java classloader only sees a single, flat package/class namespace and a
>>>> set
>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>>> flat package/class namespace and a set of compiled classes".
>>>>
>>>> This is the fact that OSGi uses to make it possible to maintain strict
>>>> module boundaries (and even dynamic module relationships at run-time).
>>>> Each
>>>> OSGi bundle sees its own classloader, and the framework is responsible
>>>> for
>>>> connecting bundles up to ensure that every bundle has what it needs in
>>>> the
>>>> way of

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-03-01 Thread anuj kumar
I agree Osma. If Lucent is upgraded to 6.4.1 it would be much easier for me
to integrate the Elastic Search implementation.

But I am still waiting for someone to provide me a hint as to how I can
index multiple predicate values. This is the most pressing issue for me
currently.

Thanks,
Anuj Kumar

On 1 Mar 2017 19:27, "Osma Suominen" <osma.suomi...@helsinki.fi> wrote:

> Hi Anuj!
>
> I have nothing against modularity in general. However, I cannot see how
> your proposal could work in practice for the Fuseki build, due to the
> reasons I mentioned in my previous message (and Adam seemed to concur).
>
> In any case, I'll see what I can do to get the Lucene upgrade moving
> again. If all current Jena modules (ie jena-text and jena-spatial) were
> upgraded to Lucene 6.4.1, then you could just add your ES classes to
> jena-text, right? I think that would be better for everyone than having to
> maintain your own separate module.
>
> -Osma
>
> 01.03.2017, 16:59, anuj kumar kirjoitti:
>
>> I personally have no preference as to how the code in Jena should be
>> structured, as long as I am able to use it :).
>> I have personal preference of doing it in a specific way because IMO, it
>> is
>> modular which makes it much easier to maintain in the long run. But again
>> it may not be the quickest one.
>>
>> I already have been given a deadline, by the company to have ES extension
>> implemented in the next 15 days :). What this means is that I will be
>> maintaining the ES code extension to Jena Text at-least locally for a
>> coming period of time. I would be more than happy to contribute to Jena
>> community whatever is required to have a proper ElasticSearch
>> Implementation in place, whether within jena-text module or as a separate
>> module. Till the time Lucene and Solr is not upgraded to the latest
>> version, I will have to maintain a separate module for jena-text-es.
>>
>> Cheers!
>> Anuj Kumar
>>
>>
>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>
>> Osma--
>>>
>>> The short answer is that yes, given the right tools you _can_ have
>>> different versions of code accessible in different ways. The longer
>>> answer
>>> is that it's probably not a viable alternative for Jena for this problem,
>>> at least not without a lot of other change.
>>>
>>> You are right to point to the classloader mechanism as being at the heart
>>> of this question, but I must alter your remark just slightly. From "the
>>> Java classloader only sees a single, flat package/class namespace and a
>>> set
>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>> flat package/class namespace and a set of compiled classes".
>>>
>>> This is the fact that OSGi uses to make it possible to maintain strict
>>> module boundaries (and even dynamic module relationships at run-time).
>>> Each
>>> OSGi bundle sees its own classloader, and the framework is responsible
>>> for
>>> connecting bundles up to ensure that every bundle has what it needs in
>>> the
>>> way of types to function, based on metadata that the bundles provide to
>>> the
>>> framework. It's an incredibly powerful system (I use it every day and
>>> enjoy
>>> it enormously) but it's also very "heavy" and requires a good deal of
>>> investment to use. In particular, it's probably too large to put _inside_
>>> Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
>>>
>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>>> this kind, but it's really meant for the JDK itself, not application
>>> libraries. In theory, we could "roll our own" classloader management for
>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>> There might be another, more lightweight, toolkit out there to this
>>> purpose, but I'm not aware of any myself.
>>>
>>> Otherwise, yes, you get into shading and the like. We have to do that for
>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
>>> thing we want to do any more of than needed, I don't think.
>>>
>>> ---
>>> A. Soroka
>>> The University of Virginia Library
>>>
>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>
>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <osma.suomi...@helsinki.fi>
>>>>
>>> wrote:
>>>
>>>&g

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-03-01 Thread anuj kumar
BTW, I have one more question:

How do I add more than one field to be indexed in my Index?
Basically, if I want to index rdfs:label , rdfs:comment in the same index
document, how do I do it?

I tried :

EntityDefinition entDef = new EntityDefinition(DOC_TYPE, FIELD_TO_SEARCH);
entDef.setPrimaryPredicate(RDFS.label);
entDef.setGraphField(GRAPH_FIELD_NAME);
entDef.set("comment", RDFS.comment.asNode());

But it doesnt work. Can you please point me on a way to do it please. This
is an important piece of functionality I need.

Thanks,
Anuj Kumar


On Wed, Mar 1, 2017 at 3:59 PM, anuj kumar <anuj.gandh...@gmail.com> wrote:

> I personally have no preference as to how the code in Jena should be
> structured, as long as I am able to use it :).
> I have personal preference of doing it in a specific way because IMO, it
> is modular which makes it much easier to maintain in the long run. But
> again it may not be the quickest one.
>
> I already have been given a deadline, by the company to have ES extension
> implemented in the next 15 days :). What this means is that I will be
> maintaining the ES code extension to Jena Text at-least locally for a
> coming period of time. I would be more than happy to contribute to Jena
> community whatever is required to have a proper ElasticSearch
> Implementation in place, whether within jena-text module or as a separate
> module. Till the time Lucene and Solr is not upgraded to the latest
> version, I will have to maintain a separate module for jena-text-es.
>
> Cheers!
> Anuj Kumar
>
>
> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>
>> Osma--
>>
>> The short answer is that yes, given the right tools you _can_ have
>> different versions of code accessible in different ways. The longer answer
>> is that it's probably not a viable alternative for Jena for this problem,
>> at least not without a lot of other change.
>>
>> You are right to point to the classloader mechanism as being at the heart
>> of this question, but I must alter your remark just slightly. From "the
>> Java classloader only sees a single, flat package/class namespace and a set
>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>> flat package/class namespace and a set of compiled classes".
>>
>> This is the fact that OSGi uses to make it possible to maintain strict
>> module boundaries (and even dynamic module relationships at run-time). Each
>> OSGi bundle sees its own classloader, and the framework is responsible for
>> connecting bundles up to ensure that every bundle has what it needs in the
>> way of types to function, based on metadata that the bundles provide to the
>> framework. It's an incredibly powerful system (I use it every day and enjoy
>> it enormously) but it's also very "heavy" and requires a good deal of
>> investment to use. In particular, it's probably too large to put _inside_
>> Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
>>
>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>> this kind, but it's really meant for the JDK itself, not application
>> libraries. In theory, we could "roll our own" classloader management for
>> this problem. That sounds like more than a bit of a rabbit hole to me.
>> There might be another, more lightweight, toolkit out there to this
>> purpose, but I'm not aware of any myself.
>>
>> Otherwise, yes, you get into shading and the like. We have to do that for
>> Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
>> thing we want to do any more of than needed, I don't think.
>>
>> ---
>> A. Soroka
>> The University of Virginia Library
>>
>> [1] http://openjdk.java.net/projects/jigsaw/
>>
>> > On Mar 1, 2017, at 9:03 AM, Osma Suominen <osma.suomi...@helsinki.fi>
>> wrote:
>> >
>> > Hi Anuj!
>> >
>> > Thanks for the clarification.
>> >
>> > However, I'm still not sure I understand the situation completely. I
>> know Maven can perform a lot of tricks, but Maven modules are just
>> convenient ways to structure a Java project. Maven cannot change the fact
>> that at runtime, module divisions don't really matter (except that they
>> usually correspond to package sub-namespaces) and the Java classloader only
>> sees a single, flat package/class namespace and a set of compiled classes
>> (usually within JARs) in the classpath that it needs to check to find the
>> right classes, and if there are two versions of the same library (eg
>> Lucene) with overlapping class name

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-03-01 Thread anuj kumar
I personally have no preference as to how the code in Jena should be
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO, it is
modular which makes it much easier to maintain in the long run. But again
it may not be the quickest one.

I already have been given a deadline, by the company to have ES extension
implemented in the next 15 days :). What this means is that I will be
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a separate
module. Till the time Lucene and Solr is not upgraded to the latest
version, I will have to maintain a separate module for jena-text-es.

Cheers!
Anuj Kumar


On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:

> Osma--
>
> The short answer is that yes, given the right tools you _can_ have
> different versions of code accessible in different ways. The longer answer
> is that it's probably not a viable alternative for Jena for this problem,
> at least not without a lot of other change.
>
> You are right to point to the classloader mechanism as being at the heart
> of this question, but I must alter your remark just slightly. From "the
> Java classloader only sees a single, flat package/class namespace and a set
> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
> flat package/class namespace and a set of compiled classes".
>
> This is the fact that OSGi uses to make it possible to maintain strict
> module boundaries (and even dynamic module relationships at run-time). Each
> OSGi bundle sees its own classloader, and the framework is responsible for
> connecting bundles up to ensure that every bundle has what it needs in the
> way of types to function, based on metadata that the bundles provide to the
> framework. It's an incredibly powerful system (I use it every day and enjoy
> it enormously) but it's also very "heavy" and requires a good deal of
> investment to use. In particular, it's probably too large to put _inside_
> Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
>
> Java 9 Jigsaw [1] offers some possibility for strong modularization of
> this kind, but it's really meant for the JDK itself, not application
> libraries. In theory, we could "roll our own" classloader management for
> this problem. That sounds like more than a bit of a rabbit hole to me.
> There might be another, more lightweight, toolkit out there to this
> purpose, but I'm not aware of any myself.
>
> Otherwise, yes, you get into shading and the like. We have to do that for
> Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
> thing we want to do any more of than needed, I don't think.
>
> ---
> A. Soroka
> The University of Virginia Library
>
> [1] http://openjdk.java.net/projects/jigsaw/
>
> > On Mar 1, 2017, at 9:03 AM, Osma Suominen <osma.suomi...@helsinki.fi>
> wrote:
> >
> > Hi Anuj!
> >
> > Thanks for the clarification.
> >
> > However, I'm still not sure I understand the situation completely. I
> know Maven can perform a lot of tricks, but Maven modules are just
> convenient ways to structure a Java project. Maven cannot change the fact
> that at runtime, module divisions don't really matter (except that they
> usually correspond to package sub-namespaces) and the Java classloader only
> sees a single, flat package/class namespace and a set of compiled classes
> (usually within JARs) in the classpath that it needs to check to find the
> right classes, and if there are two versions of the same library (eg
> Lucene) with overlapping class names, that's going to cause trouble. The
> only way around that is to shade some of the libraries, i.e. rename them so
> that they end up in another, non-conflicting namespace. Apparently
> Elasticsearch also did some of that in the past [1] but nowadays tries to
> avoid it.
> >
> > Does your assumption 1 ("At a given point in time, only a single
> Indexing Technology is used") imply that in the assembler configuration,
> you cannot have ja:loadClass declarations for both Lucene and ES backends?
> Or how do you run something like Fuseki that contains (in a single big JAR)
> both the jena-text and jena-text-es modules with all their dependencies,
> one of which requires the Lucene 4.x classes and the other one the Lucene
> 6.4.1 classes? How do you ensure that only one of them is used at a time,
> and that the Java classloader, even though it has access to both versions
> of Lucene, only

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-03-01 Thread anuj kumar
Hi Osma,

I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I will
not delve into those now. I am not an expert in Jena to convincingly say
that it is possible, without any hiccups. But I can take a guess and say
that it is indeed possible :)

For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"

I actually do not understand what you mean by mixing modules. I assume you
mean having jena-text and jena-text-es as dependencies in a build without
causing the build to conflict. If that is what you mean than the answer is
yes it is possible and quite simple as well. Let me explain how it is
possible. But before that some assumption which I want to call out
explicitly.

*Assumption:*
1. At a given point in time, only a single Indexing Technology is used for
text based indexing and searching via Jean. What this means is that we will
either use Lucene Implementation OR Solr Implementation OR ES
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
only on jena-text classes, if at all.

Based on these assumptions it is possible to create a build that contains
jena-text based common classes + ES specific classes without any
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very beginning
in the pom and then include jena-text dependency. Maven will then
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the pom of
jena-text-es module here to see how it can be done :
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml


Thanks,
Anuj Kumar


On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <osma.suomi...@helsinki.fi>
wrote:

> Hi Anuj,
>
> I understand your concerns. However, we also need to balance between the
> needs of individual modules/features and the whole codebase. I'm willing to
> put in the effort to keep the other modules up to date with newer Lucene
> versions. Lucene upgrade requirements are well documented, the only hitches
> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
> features that were dropped from newer versions.
>
> A perhaps stupid question to more experienced Java developers: is it even
> possible to mix modules that depend on different versions of the Lucene
> libraries within the same project? In my (quite limited) understanding of
> Java projects and libraries, this requires special arrangements (e.g.
> shading) as the Java package/class namespace is shared by all the code
> running within the same JVM.
>
> So can you create, say, a Fuseki build that contains the current jena-text
> module (depending on Lucene 4.x) and the new jena-text-es module (depending
> on Lucene 6.4.1) without any compatibility issues?
>
> -Osma
>
>
>
>
> 01.03.2017, 00:47, anuj kumar kirjoitti:
>
>> Hi,
>>
>> My 2 Cents :
>>
>>  The reason I proposed to have separate modules for Lucene, Solr and ES is
>> exactly for avoiding the "All or Nothing" approach we need to take if we
>> club them all together. If they stay together and if in the near future I
>> want to upgrade ES to another version, I also need to again upgrade Lucene
>> and Solr and possibly another implementation that may have been added
>> during the time. As we all know, this means weeks of work if not months to
>> get the changes released. This will personally de-motivate me to do
>> anything and I will probably start maintaining my version of Jena-Text as
>> that would be much simpler to do than to upgrade and test and in the
>> process own(read fix bugs) the upgrade for each and every technology.
>>
>> If they are developed as separate modules, they can evolve independently
>> of
>> each other and we can avoid situations where we cant upgrade to latest
>> version of Lucene because we do not know what effect it will have on Solr
>> Implementation.
>>
>> We can start with having a separate Module for Jena Text ES and see how
>> things go. If they go well, we could extract out Solr and Lucene out of
>> Jena Text.
>>
>> Again this is just a suggestion based on my limited industry experience.
>>
>> Thanks,
>> Anuj Kumar
>>
>>
>>
>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <osma.suomi...@helsinki.fi
>> >
>> wrote:
>>
>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>
>>> https://lists.apache.org/thread.html/dce0d502

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-02-28 Thread anuj kumar
Hi,

My 2 Cents :

 The reason I proposed to have separate modules for Lucene, Solr and ES is
exactly for avoiding the "All or Nothing" approach we need to take if we
club them all together. If they stay together and if in the near future I
want to upgrade ES to another version, I also need to again upgrade Lucene
and Solr and possibly another implementation that may have been added
during the time. As we all know, this means weeks of work if not months to
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of Jena-Text as
that would be much simpler to do than to upgrade and test and in the
process own(read fix bugs) the upgrade for each and every technology.

If they are developed as separate modules, they can evolve independently of
each other and we can avoid situations where we cant upgrade to latest
version of Lucene because we do not know what effect it will have on Solr
Implementation.

We can start with having a separate Module for Jena Text ES and see how
things go. If they go well, we could extract out Solr and Lucene out of
Jena Text.

Again this is just a suggestion based on my limited industry experience.

Thanks,
Anuj Kumar



On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <osma.suomi...@helsinki.fi>
wrote:

> 28.02.2017, 17:12, A. Soroka kirjoitti:
>
>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apache.org%3E
>> ? In other words, might it be better to factor out between -text and
>> -spatial and _then_ try to upgrade the Lucene version?
>>
>
> I certainly wouldn't object to that, but somebody has to volunteer to do
> the actual work!
>
> I don't use the Solr component now, but I could easily see so doing...
>> that's pretty vague, I know, and I'm not in a position to do any work to
>> maintain it, so consider that just a very small and blurry data point. :)
>>
>
> Last time I tried it (it was a while ago) I couldn't figure out how to get
> it running... If you could just try that with some toy data, then your data
> point would be a lot less blurry :) I haven't used Solr for anything, so
> I'm not very familiar with how to set it up, and the jena-text instructions
> are pretty vague unfortunately.
>
>
> -Osma
>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suomi...@helsinki.fi
> http://www.nationallibrary.fi
>



-- 
*Anuj Kumar*


Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-02-27 Thread anuj kumar
Hi All,

*Apologies for the long email.*

 As some of you know, I have been working on extending Jena to Support
ElasticSearch for Text Indexing (in addition to Lucene and Solr).

I have come to a point where I have a basic (read non-prod) code that can
index RDFS:label text data into ElasticSearch 5.2.1
The code is working and testable. You simply have to download elasticsearch
5.2.1 and run it locally for executing the test within  the ES
implementation.
The code is NOT production Ready but just a PoC code.  You can find the
first cut of the code here: https://github.com/EaseTech/jena (look inside
the module jena-text-es)

I need feedback from Jena maintainers and community, in terms of the
structuring of the code as this is important for me to finalize before I
move to implement the full blown Production Ready code for Jean Text
ElasticSearch Integration.

Here is the short description of what I did and the reasoning behind it:

1. Created a separate module : *jena-text-es *that extends from *jena-text*
AND excludes all the Lucene related and Solr related dependencies. The
reason I had to do it was that* jena-text* module depends on Lucene version
4.9.1 whereas ElasticSearch 5.2.1 version depends on Lucene 6.4.1. This was
resulting in the conflicts of Lucene version if I created the code for
ElasticSearch support within the *jena-text *module. Thus the need to
create a separate module.
2. A side effect of creating a separate module meant, I had to extend the
TextDataSetFactory.java class present in the *jena-text *module to include
methods for creating ElasticSearch index objects. I named it
ESTextDataSetFactory. At this point in time I do not know if this is the
right approach or if Jena ALWAYS instantiates Index objects using the
TextDataSetFactory.java class. My initial investigation showed it is fine,
but I want the people who are experts in Jena to please confirm.
3. I have tested a simple integration with ElasticSearch by defining a test
class under
src/test/java/org/apache/jena/query/text/TestBuildTextDataSet.java. You can
run this test by first starting an instance of Elasticsearch 5.2.1 locally.

*My Queries*
1. Is it acceptable by the Jena community that I create a separate module
for support of ElasticSearch and call it *jena-text-es*?
2. Is it fine if I extend the TextDataSetFactory.java class within the
*jena-text-es
*module?

*Food for Thought*

While implementing the ElasticSearch Integration, I could not help but
notice that the module *jena-text *not only contains the core classes for
performing text queries, but also contains technology specific (for eg.
Lucene and Solr) classes.
IMO, these should be separate and defined in their own modules to enable
separation of concerns.
This will also help in easier maintenance and extensions to be added later
on.

I think we should have the following modules:

jena-text - Containing core Jena text specific classes that are technology
agnostic.
jena-text-lucene - Lucene specific implementation of Jena-Text
jena-text-solr - Solr specific implementation of Jena-Text
jena-text-es - ElasticSearch specific implementation of Jena-Text

What does everyone think?

Thanks,
Anuj Kumar


On Tue, Feb 14, 2017 at 2:27 PM, anuj kumar <anuj.gandh...@gmail.com> wrote:

> My saviour Osma. It worked :)
> Thanks for pointing that out. Really appreciate it.
> I am now to my next task. Implementing the actual code for ElasticSearch
> integration with Jena.
>
> Thanks once again.
>
> Anuj Kumar
>
> On Tue, Feb 14, 2017 at 2:22 PM, Osma Suominen <osma.suomi...@helsinki.fi>
> wrote:
>
>> 14.02.2017, 15:15, anuj kumar kirjoitti:
>>
>>> I will do it. But I need to first get the simple test working in order to
>>> move forward. I hope I someone here can help me.
>>>
>>
>> Maybe you need to add an implementWith declaration to TextAssembler.java?
>>
>>
>> -Osma
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suomi...@helsinki.fi
>> http://www.nationallibrary.fi
>>
>
>
>
> --
> *Anuj Kumar*
>



-- 
*Anuj Kumar*


Re: Jena + visualization tool

2014-04-02 Thread Anuj Kumar
You can also give Sgvizler a try- http://dev.data2000.no/sgvizler/
It can also communicate to the SPARQL endpoint. I have used it to visualize
SPARQL queries against DBpedia dump loaded in TDB. Works like a charm.

Regards,
Anuj


On Thu, Apr 3, 2014 at 2:28 AM, Olivier Rossel olivier.ros...@gmail.comwrote:

 Datao communicates with any SPARQL endpoint.
 So you have to make your TDB accessible via SPARQL.
 I think this is the point of the Fuseki project.




 On Wed, Apr 2, 2014 at 10:53 PM, Adeeb Noor adeeb.n...@gmail.com wrote:

  Thanks for the reply . Would please provide me with a simple example of
  how to use it with Jena.
 
  I want an API cause everything is stored in TDB
 
 
   On Apr 2, 2014, at 14:20, Olivier Rossel olivier.ros...@gmail.com
  wrote:
  
   May be Datao can help:
   http://datao.net
  
  
   On Wed, Apr 2, 2014 at 8:19 PM, Adeeb Noor adeeb.n...@colorado.edu
  wrote:
  
   Hi All:
  
   I mainly use Jena to mange my Semantic web application. I want to know
  if
   there is a visualization tool that is linked easily with Jena.
  
   I need to draw some graphs for my project and want something easy to
  pick
   up and learn due to time-limiting.
  
   My TDB is almost 15G, so please keep that in mined.
  
   Thanks in advance.
  
   --
   Adeeb Noor
   Ph.D. Candidate
   Dept of Computer Science
   University of Colorado at Boulder
   Cell: 571-484-3303
   Email: adeeb.n...@colorado.edu