Re: [orientdb] Re: Is using ElasticSearch with OrientDB possible?

Nicolas Harraudeau Thu, 19 Mar 2015 11:05:03 -0700

Hi Enrico,

Thank you for your rapid answer.


This is indeed an interesting possibility. However, I see some problems:

- If I understand correctly there is one index per OrientDB node. 
Elasticsearch has its own replication and consistency mechanism. Thus the 
index should be updated only once. This might also create problems with 
transactions.
- Does an OIndexEngine contain the full documents or only the RIDs? The 
goal of indexing in Elasticsearch is to be able to query it directly as it 
offers different features (like highlights and ngram autocomplete).
- I'm not sure that running Elasticsearch and OrientDB in the same process 
is a good idea. Elasticsearch is known to have out of memory and split 
brain problems. It might create nightmarish situations. I would prefer to 
index OrientDB from time to time using a separate process.

Do I make wrong suppositions?

Regards,

On Thursday, March 19, 2015 at 5:44:18 PM UTC+1, Enrico Risa wrote:
>
> Hi Guys,
>
> i'm the maintainer of Lucene Plugin, for the plugin i implemented a custom 
> index engine.
> You can see some documentation here.
>
> http://www.orientechnologies.com/docs/2.0/orientdb.wiki/Custom-Index-Engine.html
>
> The integration should not be too hard. Once implemented
> You could create an elastic search index directly with Orientdb Sql syntax 
> like
>
> 'create index Foo.bar on Foo (bar) FULLTEXT ENGINE ELASTICSEARCH'
>
> Could be a really good project :D
> I really don't have time now but i can help with some code if someone is 
> interested.
>
> Enrico
>
> 2015-03-19 17:37 GMT+01:00 Nicolas Harraudeau <nicolas.h...@gmail.com 
> <javascript:>>:
>
>> Hi Patrick,
>> I have searched a way to do it myself but didn't found a correct way to 
>> do it. Here is what I found:
>>
>> Having worked with indexing problems before on another search engine and 
>> other sources, there are always two different jobs:
>> - The first one does a full scan of the source. With OrientDB it is 
>> possible using a simple JDBC driver and a few requests. OrientDB can be 
>> completely scanned using pagination 
>> http://www.orientechnologies.com/docs/last/Pagination.html
>> - The second job is more complex. It has to fetch only modified documents 
>> as often as you need in order to have up to date results.
>>
>> When fetching updates you want to scan from the start date of the last 
>> scan because modifications can happen during the scan itself. Let's name 
>> this start date "checkpoint".
>>
>> My first thought was that I could save the last modification timestamp in 
>> OrientDB docs. But I didn't found any way to generate it during commit. It 
>> MUST not be generated by the application as this would add dates which are 
>> generated BEFORE the checkpoint but saved AFTER this same checkpoint. Think 
>> of your application making a modification that spans the start of the 
>> update scan.
>>
>> The second approach would be to create a "Modifications to scan" vertex 
>> and link to it every modified document. This would not scale as it would 
>> conflict more and more during transactions.
>>
>> The third approach is to use Hooks which would mark documents as 
>> modified. However the documentation is rather poor on those. In order to be 
>> used by an update scan, hook registration need to be transactional. I asked 
>> here if adding a hook invalidates the running transactions (
>> https://groups.google.com/forum/#!topic/orient-database/FBHiZg68b1s) but 
>> did not receive any answer. I tested it myself and found that it is not 
>> working as I would like (
>> https://github.com/orientechnologies/orientdb/issues/3763). There is 
>> still no information as to how it SHOULd work. No specifications.
>>
>> Maybe one of those features will enable to have a correct update stream:
>> https://github.com/orientechnologies/orientdb/issues/2652
>> https://github.com/orientechnologies/orientdb/issues/1227
>>
>> In the mean time, I don't see any way to index correctly OrientDB. If 
>> someone succeeded at indexing OrientDB I am interested too.
>>
>> OrientDB-Lucene is promising but it is too limited for me right now. I 
>> cannot work without features like highlights or complex scoring.
>>
>> On Monday, March 16, 2015 at 4:41:36 PM UTC+1, Kevin I wrote:
>>>
>>> I can see that OrientDB lucene indices can be done through 
>>> orientdb-lucene <https://github.com/orientechnologies/orientdb-lucene>, 
>>> but is there a way to use ElasticSearch in OrientDB? In TitanDB, 
>>> ElasticSearch support was inbuilt. It would be great if OrientDB has that 
>>> too.
>>>
>>> If not, can I make the two work together out of the box? I haven't used 
>>> ElasticSearch before, so it would be of great help if anyone can help me 
>>> out with this.
>>>
>>> Thanks.
>>>
>>  -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to orient-databa...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Re: Is using ElasticSearch with OrientDB possible?

Reply via email to