[
https://issues.apache.org/jira/browse/JENA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901767#comment-15901767
]
Osma Suominen commented on JENA-1305:
-------------------------------------
Hi Anuj!
Thanks for posting the code. Some quick observations:
* Could your ES unit tests be implemented using ESTestCase from the
[Elasticsearch Java Testing
Framework|https://www.elastic.co/guide/en/elasticsearch/reference/current/testing-framework.html]?
In my understanding, this would allow running the tests using an embedded
Elasticsearch engine, instead of having to set up an external ES instance.
* It appears that you've chosen a model where there is a single ES document for
each subject URI. Have you thought about situations where there are multiple
values for the same property, or many different properties for the same entity?
What about the same subject entity being defined in multiple graphs?
Traditionally jena-text has instead had one Lucene/Solr document per RDF triple
(or quad when graphField is used). While that setup has its disadvantages, it
is at least straightforward to handle situations like this.
* I noticed that you have switched the import statements to use the original
Guava classes instead of the Jena shaded ones. I believe this will cause
compatibility problems with Hadoop - the shading is done for a reason. I
suggest that you switch back to the Jena shaded versions
(org.apache.jena.ext.com.google.common.collect.*)
I will do a more thorough review soon...
> Elastic Search Support for Apache Jena Text
> --------------------------------------------
>
> Key: JENA-1305
> URL: https://issues.apache.org/jira/browse/JENA-1305
> Project: Apache Jena
> Issue Type: New Feature
> Components: Text
> Affects Versions: Jena 3.2.0
> Reporter: Anuj Kumar
> Assignee: Osma Suominen
> Labels: elasticsearch
> Original Estimate: 240h
> Remaining Estimate: 240h
>
> This Jira tracks the development of Jena Text ElasticSearch Implementation.
> The goal is to extend Jena Text capability to index, at scale, in
> ElasticSearch. This implementation would be similar to the Lucene and Solr
> implementations.
> We will use ES version 5.2.1 for the implementation.
> The following functionalities would be supported:
> * Indexing Literal values
> * Updating indexed values
> * Deleting Indexed values
> * Custom Analyzer Support
> * Configuration using Assembler as well as Java techniques.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)