[jira] [Commented] (JENA-1305) Elastic Search Support for Apache Jena Text

Osma Suominen (JIRA) Wed, 08 Mar 2017 10:50:46 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901767#comment-15901767
 ]


Osma Suominen commented on JENA-1305:
-------------------------------------

Hi Anuj!

Thanks for posting the code. Some quick observations:

* Could your ES unit tests be implemented using ESTestCase from the 
[Elasticsearch Java Testing 
Framework|https://www.elastic.co/guide/en/elasticsearch/reference/current/testing-framework.html]?
 In my understanding, this would allow running the tests using an embedded 
Elasticsearch engine, instead of having to set up an external ES instance.
* It appears that you've chosen a model where there is a single ES document for 
each subject URI. Have you thought about situations where there are multiple 
values for the same property, or many different properties for the same entity? 
What about the same subject entity being defined in multiple graphs? 
Traditionally jena-text has instead had one Lucene/Solr document per RDF triple 
(or quad when graphField is used). While that setup has its disadvantages, it 
is at least straightforward to handle situations like this.
* I noticed that you have switched the import statements to use the original 
Guava classes instead of the Jena shaded ones. I believe this will cause 
compatibility problems with Hadoop - the shading is done for a reason. I 
suggest that you switch back to the Jena shaded versions 
(org.apache.jena.ext.com.google.common.collect.*)

I will do a more thorough review soon...

> Elastic Search Support for Apache Jena Text 
> --------------------------------------------
>
>                 Key: JENA-1305
>                 URL: https://issues.apache.org/jira/browse/JENA-1305
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: Text
>    Affects Versions: Jena 3.2.0
>            Reporter: Anuj Kumar
>            Assignee: Osma Suominen
>              Labels: elasticsearch
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> This Jira tracks the development of Jena Text ElasticSearch Implementation.
> The goal is to extend Jena Text capability to index, at scale, in 
> ElasticSearch. This implementation would be similar to the Lucene and Solr 
> implementations.
> We will use ES version 5.2.1 for the implementation.
> The following functionalities would be supported:
> * Indexing Literal values
> * Updating indexed values
> * Deleting Indexed values
> * Custom Analyzer Support
> * Configuration using Assembler as well as Java techniques.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (JENA-1305) Elastic Search Support for Apache Jena Text

Reply via email to