Thanks for your quick reply, I need some clarifications about what you meant by "delete the river", "delete the _river index" and by "this state is useful for flow control".
>From what I have understand from your reply and supposing that I have imported data into a "documents" river using the JDBC river: - "Delete the river" means "DELETE _river/documents" (and does not mean "DELETE documents"): - This does not affect the already imported data. - The data is not reimported into ElasticSearch at restart. - Everything is fine for our use case. - "Delete the _river index" means "DELETE _river": - This does not affect the already imported data. - The data is not reimported into ElasticSearch at restart. - This should not be done because it affects all the rivers at the same time (for the documents river, it is equivalent of doing "DELETE _river/documents"). - "This state is useful for flow control" means that: - The state keeps track of what data is already imported so that the same raw data (left untouched in ElasticSearch) is not reimported multiple times ? - OR The state keeps a trace of the SQL query so that, in case of an error during a node start/stop, the river can be automatically replayed ? Thanks again, Stéphane. On Wednesday, June 25, 2014 6:08:52 PM UTC+2, Jörg Prante wrote: > > Because each river can freely implement the data fetch, ES does not offer > river monitoring. > > For JDBC river, I implemented some primitive river state query commands > that allow polling for river state changes. > > Jörg > > > On Wed, Jun 25, 2014 at 6:00 PM, Tanguy Bernard <bernardt...@gmail.com > <javascript:>> wrote: > >> Hello, >> This post interested me. >> Have we a way to know when indexing is finished and thus triggered the >> XDELETE _river? >> >> Le mercredi 25 juin 2014 17:54:01 UTC+2, Jörg Prante a écrit : >>> >>> It is up to the river implementation how the data import is handled. >>> >>> The JDBC river, in the "simple" strategy, imports data when the river is >>> started, regardless of existing cluster or index. It is possible to >>> implement other strategies, for example, a strategy that performs a check >>> before indexing. >>> >>> There is no support for river implementations about node start/stop >>> control and how to behave. JDBC river tries to compensate this by >>> persisting a JDBC river specific state. This state is useful for flow >>> control. >>> >>> If you do no longer need the river, you can delete the river with curl >>> -XDELETE, this shuts down river instance threads gracefully and releases >>> resources. >>> >>> If you delete the _river index with curl -XDELETE, you wipe all data >>> that is used by rivers. Active river instances are not stopped and are not >>> aware of what happened, so this is an unfriendly way to terminate river >>> runs, all kind of river errors may occur. >>> >>> Jörg >>> >>> >>> >>> On Wed, Jun 25, 2014 at 5:38 PM, Stéphane Seng <seng.s...@gmail.com> >>> wrote: >>> >>>> Hello, >>>> >>>> I have a question about the fact that, when rivers are used to import >>>> data into ElasticSearch, rivers are also reimporting data at each >>>> ElasticSearch restart. >>>> >>>> In our project, what we are doing is as follows : >>>> >>>> - Raw data is imported into ElasticSearch from a MySQL database >>>> using the JDBC river (https://github.com/jprante/ >>>> elasticsearch-river-jdbc); >>>> - Some updates are executed directly on the newly imported data in >>>> ElasticSearch using POST requests; >>>> - In the end, the final data stored in ElasticSearch is not the >>>> same than the imported raw data. >>>> >>>> The problem we are facing is that when ElasticSearch is restarted, the >>>> JDBC river is reimporting the raw data thus overriding the transformations >>>> made. >>>> We suppose that this is an intentional behavior from ElasticSearch >>>> rivers. >>>> One solution to avoid the reimporting of data is to delete the >>>> corresponding _river index, which is supposed to store the state of the >>>> rivers. >>>> >>>> Our questions are as follows : >>>> >>>> - Is the reimporting of data from rivers at each restart is a >>>> standard use case ? Is it useful for some applications ? >>>> - What is the point of the _river index state saving ? >>>> - Is there a way to avoid the reimporting of data without having >>>> to delete the corresponding _river index ? >>>> - Is there any downsides (for our use case) to delete the >>>> corresponding _river index ? >>>> >>>> Thanks, >>>> Stéphane. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to elasticsearc...@googlegroups.com. >>>> >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/a59ade79-e474-466b-bf54-1476a7c506bb% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/a59ade79-e474-466b-bf54-1476a7c506bb%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/2b7f91f1-4fa0-4e66-8193-cd0e6fa35982%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/2b7f91f1-4fa0-4e66-8193-cd0e6fa35982%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1a91a264-f53a-49c7-91f4-1438b9de3e91%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.