[
https://issues.apache.org/jira/browse/JENA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895788#comment-15895788
]
Osma Suominen edited comment on JENA-1305 at 3/4/17 5:45 PM:
-------------------------------------------------------------
This sounds great! From my perspective, even a smaller feature set would be
acceptable, as long as basic text indexing functionality works.
One important thing is to have unit tests from the start. Luckily ES seems to
provide good support for that in the form of a [testing
framework|https://www.elastic.co/guide/en/elasticsearch/reference/current/testing-framework.html].
I hope you can make use of that (or something similar).
I hope you can make use of the existing jena-text Lucene code (and possibly the
Solr code as well if it helps). In fact, I strongly suggest that you avoid
duplicating code if at all possible, and instead try to implement the ES side
so that it shares as much code as possible with the Lucene support. This may
require some refactoring of existing code; I'm willing to help with that.
Also I hope that you can make use of the existing Lucene unit tests. In my
mind, the unit tests that test a specific feature (say, deleting indexed
values) should be the same regardless of which backend (Lucene/ES) is being
used. This may require some reengineering of the test classes so that their
functionality and naming can become backend-independent. The inheritance
hierarchy is already quite convoluted though, and I'm partially responsible for
that. I can help with the tests as well.
You can base your implementation on this branch:
https://github.com/osma/jena/tree/jena-1301-drop-solr
i.e. my branch which contains the Lucene 6 upgrade (JENA-1250/PR #219) as well
as dropping of Solr support (JENA-1301/PR #220). I expect to merge these to
Jena master soon, I just want to give people a chance to comment and perhaps do
some additional testing as well before merging.
Just a reminder: When the code is done, the [jena-text
documentation|https://jena.apache.org/documentation/query/text-query.html]
needs to be updated as well. Also there should be example configuration files
for jena-text with ES alongside the jena-text/Lucene examples.
was (Author: osma):
This sounds great! From my perspective, even a smaller feature set would be
acceptable, as long as basic text indexing functionality works.
One important thing is to have unit tests from the start. Luckily ES seems to
provide good support for that in the form of a [testing
framework|https://www.elastic.co/guide/en/elasticsearch/reference/current/testing-framework.html].
I hope you can make use of that (or something similar).
I hope you can make use of the existing jena-text Lucene code (and possibly the
Solr code as well if it helps). In fact, I strongly suggest that you avoid
duplicating code if at all possible, and instead try to implement the ES side
so that it shares as much code as possible with the Lucene support. This may
require some refactoring of existing code; I'm willing to help with that.
Also I hope that you can make use of the existing Lucene unit tests. In my
mind, the unit tests that test a specific feature (say, deleting indexed
values) should be the same regardless of which backend (Lucene/ES) is being
used. This may require some reengineering of the test classes so that their
functionality and naming can become backend-independent. The inheritance
hierarchy is already quite convoluted though, and I'm partially responsible for
that. I can help with the tests as well.
You can base your implementation on this branch:
https://github.com/osma/jena/tree/jena-1301-drop-solr
i.e. my branch which contains the Lucene 6 upgrade (JENA-1250/PR #219) as well
as dropping of Solr support (JENA-1301/PR #220). I expect to merge these to
Jena master soon, I just want to give people a chance to comment and perhaps do
some additional testing as well before merging.
Just a reminder: When the code is done, the [jena-text
documentation](https://jena.apache.org/documentation/query/text-query.html)
needs to be updated as well. Also there should be example configuration files
for jena-text with ES alongside the jena-text/Lucene examples.
> Elastic Search Support for Apache Jena Text
> --------------------------------------------
>
> Key: JENA-1305
> URL: https://issues.apache.org/jira/browse/JENA-1305
> Project: Apache Jena
> Issue Type: New Feature
> Components: Text
> Affects Versions: Jena 3.2.0
> Reporter: Anuj Kumar
> Assignee: Osma Suominen
> Labels: elasticsearch
> Original Estimate: 240h
> Remaining Estimate: 240h
>
> This Jira tracks the development of Jena Text ElasticSearch Implementation.
> The goal is to extend Jena Text capability to index, at scale, in
> ElasticSearch. This implementation would be similar to the Lucene and Solr
> implementations.
> We will use ES version 5.2.1 for the implementation.
> The following functionalities would be supported:
> * Indexing Literal values
> * Updating indexed values
> * Deleting Indexed values
> * Custom Analyzer Support
> * Configuration using Assembler as well as Java techniques.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)