Hi Kevin, There might be a way to mark documents as updated. This is not an easy solution and I didn't try it yet. It uses MVCC and Optimistic Transactions (you can read more about this here http://www.orientechnologies.com/docs/2.0/orientdb.wiki/Transactions.html). Let's say you have your application on one side which is adding, deleting, updating documents in OrientDB. On the other side you have your replication process which reads OrientDB and writes in Elasticsearch.
When your replication process starts scanning OrientDB, it creates/replace first a unique vertex (let's call it "checkpoint vertex") which contains the start date of the scan. Each time your application modifies OrientDB, it reads the checkpoint vertex and set the modification date of each indexed vertex/edge to its date. If a scan started during the modification, the checkpoint vertex has been changed and the transaction should fail. For deletes, a vertex describing the delete has to be created. This has some drawbacks: - the application has either to know what is indexed in ES, or it has to set a date on every vertex/edge. - you must use transactions even when you want to modify one vertex/edge. I don't like this solution very much but it might be ok for you. You might also use a file or something else as a modifications log. But then you can't backup both the modification log and the OrientDB graph at the same time. Regards, On Thursday, March 19, 2015 at 5:37:18 PM UTC+1, Nicolas Harraudeau wrote: > > Hi Patrick, > I have searched a way to do it myself but didn't found a correct way to do > it. Here is what I found: > > Having worked with indexing problems before on another search engine and > other sources, there are always two different jobs: > - The first one does a full scan of the source. With OrientDB it is > possible using a simple JDBC driver and a few requests. OrientDB can be > completely scanned using pagination > http://www.orientechnologies.com/docs/last/Pagination.html > - The second job is more complex. It has to fetch only modified documents > as often as you need in order to have up to date results. > > When fetching updates you want to scan from the start date of the last > scan because modifications can happen during the scan itself. Let's name > this start date "checkpoint". > > My first thought was that I could save the last modification timestamp in > OrientDB docs. But I didn't found any way to generate it during commit. It > MUST not be generated by the application as this would add dates which are > generated BEFORE the checkpoint but saved AFTER this same checkpoint. Think > of your application making a modification that spans the start of the > update scan. > > The second approach would be to create a "Modifications to scan" vertex > and link to it every modified document. This would not scale as it would > conflict more and more during transactions. > > The third approach is to use Hooks which would mark documents as modified. > However the documentation is rather poor on those. In order to be used by > an update scan, hook registration need to be transactional. I asked here if > adding a hook invalidates the running transactions ( > https://groups.google.com/forum/#!topic/orient-database/FBHiZg68b1s) but > did not receive any answer. I tested it myself and found that it is not > working as I would like ( > https://github.com/orientechnologies/orientdb/issues/3763). There is > still no information as to how it SHOULd work. No specifications. > > Maybe one of those features will enable to have a correct update stream: > https://github.com/orientechnologies/orientdb/issues/2652 > https://github.com/orientechnologies/orientdb/issues/1227 > > In the mean time, I don't see any way to index correctly OrientDB. If > someone succeeded at indexing OrientDB I am interested too. > > OrientDB-Lucene is promising but it is too limited for me right now. I > cannot work without features like highlights or complex scoring. > > On Monday, March 16, 2015 at 4:41:36 PM UTC+1, Kevin I wrote: >> >> I can see that OrientDB lucene indices can be done through >> orientdb-lucene <https://github.com/orientechnologies/orientdb-lucene>, >> but is there a way to use ElasticSearch in OrientDB? In TitanDB, >> ElasticSearch support was inbuilt. It would be great if OrientDB has that >> too. >> >> If not, can I make the two work together out of the box? I haven't used >> ElasticSearch before, so it would be of great help if anyone can help me >> out with this. >> >> Thanks. >> > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
