Re: Decoupling Data and indexing

joergpra...@gmail.com Wed, 12 Nov 2014 00:24:03 -0800

There is no current method to redirect indexing to a preparer index for
delayed indexing, while searching is still enabled.


By using rivers, you can close the _river index, some rivers (not all) may
take this as an indicator to stop indexing unless the _river index is
reopened. I consider this as a workaround and not as a feature.

>From my understanding the most preferred method to implement delayed
indexing currently is to set up a durable message queue (like RabbitMQ and
logstash) for external document persistency. By stopping/starting and
reconfiguring the message queue, the data can be indexed wherever you like.

If you like to see delayed indexing as a core feature in ES and not as a
plugin, then you should open an issue with the suggestion. To be honest I
assume this will be rejected in favor of a queue in front of ES, like
described in this blog post

http://dopey.io/logstash-rabbitmq-tuning.html

Jörg


On Tue, Nov 11, 2014 at 11:40 PM, Amish Asthana <asthanaam...@gmail.com>
wrote:

> Thanks Jorg, make sense.
> Few  minor questions :
> a) With the current ES architecture is this the best/recommended way?
> b) Is there any project in roadmap to provide more support for it.
>
> regards and thanks
> amish
>
> On Tuesday, November 11, 2014 12:08:24 PM UTC-8, Jörg Prante wrote:
>>
>> FAST stored the source data in distributed machines, only the control API
>> was not distributed (similar to ES HTTP curl requests, which also connect
>> to one host only).
>>
>> Of course you could index raw JSON to a preparer index with a single
>> field, _all disabled, and field set to "not indexed" so there is no Lucene
>> activity on it. This preparer index could also hold mappings in special
>> documents for the indexing runs.
>>
>> The data duplication factor depends on the complexity of the mapping(s),
>> and the characteristics of the data (dictionary size, analyzer / tokenizer
>> output, norms etc.)
>>
>> A plugin would do no magic at all, it could bundle the calls that
>> otherwise a client would have to execute from remote, and adds some
>> convenience commands for managing the prepare stage (e.g. suspend/resume)
>> and showing the current state of indexing.
>>
>> If redundant data is a no-go, then the whole approach is counterintuitive.
>>
>> Jörg
>>
>>
>> On Tue, Nov 11, 2014 at 7:46 PM, Amish Asthana <asthan...@gmail.com>
>> wrote:
>>
>>> With existing Elastic Search I can think of an architecture like this.
>>>
>>> Index : indexForDataDump : No mapping(Is it possible?) or minimum
>>> mapping. Use only to dump data from external system. There is some primary
>>> key.
>>>
>>> There are different search indexes with different mapping :
>>> search-index1, search-index2 etc.
>>> These indexes get populated from the indexForDataDump using technique
>>> mentioned here
>>> <http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/>
>>> .
>>> So this way I can drop the search index as desired and create new one
>>> with new mapping.
>>> Any pros/cons or issue with this approach? There will be data
>>> duplication but  I am hoping its minimum. ( Any way to quantify it?)
>>>
>>> regards and thanks
>>> amish
>>>
>>>
>>> On Tuesday, November 11, 2014 10:02:46 AM UTC-8, Amish Asthana wrote:
>>>>
>>>> I am not aware of FAST but the idea looks promising.
>>>> However it might not be that easy to just have plugin for ES, as the
>>>> data itself is distributed on different machines.
>>>> So it will not be possible to have just one server with the data, as it
>>>> will become single point of failure.
>>>> regards and thanks
>>>> amish
>>>>
>>>> On Tuesday, November 11, 2014 1:21:53 AM UTC-8, Jörg Prante wrote:
>>>>>
>>>>> I know from the FAST Search engine ten years ago there was a two-phase
>>>>> commit for distributed search and indexing. One server could listen on the
>>>>> API and keep the (compressed) input stored, and all the other indexing
>>>>> servers were supplied by this input in another phase to create binary
>>>>> indexes, either automatically, or by manual operation, called
>>>>> "suspend/resume indexing API".
>>>>>
>>>>> The advantage was that data could be received permanently via API
>>>>> while FAST indexing could be stopped temporarily in order to balance
>>>>> between indexing and search performance on limited hardware.
>>>>>
>>>>> Do you think of something like that also for Elasticsearch? This
>>>>> architecture is possible to implement by a plugin.
>>>>>
>>>>> Jörg
>>>>>
>>>>> On Mon, Nov 10, 2014 at 10:13 PM, Amish Asthana <asthan...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>> Is there a way we can decouple data and associated mapping/indexing
>>>>>> in Elasticsearch itself.
>>>>>> Basically store the raw data as source( json or some other format)
>>>>>> and various mapping/index can be used on top of that.
>>>>>> I understand that one can use an outside database or file system, but
>>>>>> can it be natively achieved in ES itself.
>>>>>>
>>>>>> Basically we are trying to see how our ES instance will work when we
>>>>>> have to change mapping of existing and continuously incoming data without
>>>>>> any downtime for the end user.
>>>>>> We have an added wrinkle that our indexing has to be edit aware for
>>>>>> versioning purpose; unlike ES where each edit is a new record.
>>>>>> regards and thanks
>>>>>> amish
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%40goo
>>>>>> glegroups.com
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/367562df-b374-47e6-9bf2-53a1302f5a93%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/367562df-b374-47e6-9bf2-53a1302f5a93%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGrxq0S5HcY8bwohqexPWqCTwR2DR521UUs_K-WsNqWiQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Decoupling Data and indexing

Reply via email to