Re: Decoupling Data and indexing

joergpra...@gmail.com Tue, 11 Nov 2014 12:08:47 -0800

FAST stored the source data in distributed machines, only the control API
was not distributed (similar to ES HTTP curl requests, which also connect
to one host only).


Of course you could index raw JSON to a preparer index with a single field,
_all disabled, and field set to "not indexed" so there is no Lucene
activity on it. This preparer index could also hold mappings in special
documents for the indexing runs.

The data duplication factor depends on the complexity of the mapping(s),
and the characteristics of the data (dictionary size, analyzer / tokenizer
output, norms etc.)

A plugin would do no magic at all, it could bundle the calls that otherwise
a client would have to execute from remote, and adds some convenience
commands for managing the prepare stage (e.g. suspend/resume) and showing
the current state of indexing.

If redundant data is a no-go, then the whole approach is counterintuitive.

Jörg


On Tue, Nov 11, 2014 at 7:46 PM, Amish Asthana <asthanaam...@gmail.com>
wrote:

> With existing Elastic Search I can think of an architecture like this.
>
> Index : indexForDataDump : No mapping(Is it possible?) or minimum mapping.
> Use only to dump data from external system. There is some primary key.
>
> There are different search indexes with different mapping : search-index1,
> search-index2 etc.
> These indexes get populated from the indexForDataDump using technique
> mentioned here
> <http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/>.
> So this way I can drop the search index as desired and create new one with
> new mapping.
> Any pros/cons or issue with this approach? There will be data duplication
> but  I am hoping its minimum. ( Any way to quantify it?)
>
> regards and thanks
> amish
>
>
> On Tuesday, November 11, 2014 10:02:46 AM UTC-8, Amish Asthana wrote:
>>
>> I am not aware of FAST but the idea looks promising.
>> However it might not be that easy to just have plugin for ES, as the data
>> itself is distributed on different machines.
>> So it will not be possible to have just one server with the data, as it
>> will become single point of failure.
>> regards and thanks
>> amish
>>
>> On Tuesday, November 11, 2014 1:21:53 AM UTC-8, Jörg Prante wrote:
>>>
>>> I know from the FAST Search engine ten years ago there was a two-phase
>>> commit for distributed search and indexing. One server could listen on the
>>> API and keep the (compressed) input stored, and all the other indexing
>>> servers were supplied by this input in another phase to create binary
>>> indexes, either automatically, or by manual operation, called
>>> "suspend/resume indexing API".
>>>
>>> The advantage was that data could be received permanently via API while
>>> FAST indexing could be stopped temporarily in order to balance between
>>> indexing and search performance on limited hardware.
>>>
>>> Do you think of something like that also for Elasticsearch? This
>>> architecture is possible to implement by a plugin.
>>>
>>> Jörg
>>>
>>> On Mon, Nov 10, 2014 at 10:13 PM, Amish Asthana <asthan...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>> Is there a way we can decouple data and associated mapping/indexing in
>>>> Elasticsearch itself.
>>>> Basically store the raw data as source( json or some other format)  and
>>>> various mapping/index can be used on top of that.
>>>> I understand that one can use an outside database or file system, but
>>>> can it be natively achieved in ES itself.
>>>>
>>>> Basically we are trying to see how our ES instance will work when we
>>>> have to change mapping of existing and continuously incoming data without
>>>> any downtime for the end user.
>>>> We have an added wrinkle that our indexing has to be edit aware for
>>>> versioning purpose; unlike ES where each edit is a new record.
>>>> regards and thanks
>>>> amish
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%
>>>> 40googlegroups.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEcAt0xR5Ch7dE53SQcoOgjkbd%3DcBX4dRsG9EDVdnWUfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Decoupling Data and indexing

Reply via email to