subject:"Insert later feature"

Re: Insert later feature

2014-02-23 Thread Michael Sick

Also, if there are no other clients wanting a faster refresh, you can
set index.refresh_interval to a higher value than the 1s default either in
general for your index or just during the times when you're doing your bulk
updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Best method to achieve this would be to implement this in front of ES so
the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate the two
phases of gathering documents and indexing documents. So, by giving a
scheduling option, it will be possible to index (or even reindex) gathered
documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan
vm.vineethmo...@gmail.comwrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same time
doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed and
are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts but
do the actual indexing ( Which should be the CPU consuming part ) later.

I figured out that Elasticsearch would wait for 1 second before making
the documents searchable.
Here , what is it waiting for ? Is it to index the document or reopening
the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there
which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Insert later feature

2014-02-23 Thread vineeth mohan

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i am not
sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index and
index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick
michael.s...@serenesoftware.com wrote:

On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Best method to achieve this would be to implement this in front of ES so
the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate the
two phases of gathering documents and indexing documents. So, by giving a
scheduling option, it will be possible to index (or even reindex) gathered
documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan vm.vineethmo...@gmail.com
wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same time
doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed and
are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts
but do the actual indexing ( Which should be the CPU consuming part ) later.

Kindly let me know if there are any other similar features out there
which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMvTYj0amH46nkm%3DkAEZ6HS2yaAYX5fadS7vaY6cmRvw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Insert later feature

2014-02-23 Thread joergpra...@gmail.com

Yes, it is possible to disable the translog sync (the component where the
operations are passed from ES to Lucene) with index.gateway.local.flush: -1
and use the flush action for manual commit instead.

I have never done that practically, though.

Jörg

On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan vm.vineethmo...@gmail.comwrote:

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i am
not sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index and
index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick
michael.s...@serenesoftware.com wrote:

On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Best method to achieve this would be to implement this in front of ES so
the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate the
two phases of gathering documents and indexing documents. So, by giving a
scheduling option, it will be possible to index (or even reindex) gathered
documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan
vm.vineethmo...@gmail.com wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same time
doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed and
are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts
but do the actual indexing ( Which should be the CPU consuming part )
later.

I figured out that Elasticsearch would wait for 1 second before making
the documents searchable.
Here , what is it waiting for ? Is it to index the document or
reopening the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there
which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMvTYj0amH46nkm%3DkAEZ6HS2yaAYX5fadS7vaY6cmRvw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this

Re: Insert later feature

2014-02-23 Thread joergpra...@gmail.com

Oops, the correct parameter is index.translog.disable_flush : true

index.gateway.local.flush: -1 is controlling the gateway.

Jörg

On Sun, Feb 23, 2014 at 8:21 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

I have never done that practically, though.

Jörg

On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan
vm.vineethmo...@gmail.comwrote:

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i am
not sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index and
index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick
michael.s...@serenesoftware.com wrote:

On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Best method to achieve this would be to implement this in front of ES
so the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate the
two phases of gathering documents and indexing documents. So, by giving a
scheduling option, it will be possible to index (or even reindex) gathered
documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan
vm.vineethmo...@gmail.com wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same
time doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed
and are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts
but do the actual indexing ( Which should be the CPU consuming part )
later.

I figured out that Elasticsearch would wait for 1 second before making
the documents searchable.
Here , what is it waiting for ? Is it to index the document or
reopening the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there
which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

Re: Insert later feature

2014-02-23 Thread vineeth mohan

Hello Joerg ,

So if i disable it , ES wont write the feeds to lucene until i make a
manual flush...
I believe translog is written to a file and its not resident in the memory.
This also means that translogs are maintained between restarts and we will
never loose data.

If all the above are right , then this might be a good candidate for my
purpose.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 12:54 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Oops, the correct parameter is index.translog.disable_flush : true

index.gateway.local.flush: -1 is controlling the gateway.

Jörg

On Sun, Feb 23, 2014 at 8:21 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

I have never done that practically, though.

Jörg

On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan vm.vineethmo...@gmail.com
wrote:

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i am
not sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index
and index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick
michael.s...@serenesoftware.com wrote:

On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Best method to achieve this would be to implement this in front of ES
so the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate the
two phases of gathering documents and indexing documents. So, by giving a
scheduling option, it will be possible to index (or even reindex) gathered
documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected
documents
will be stored in an archive format at each gatherer node, like the
archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan
vm.vineethmo...@gmail.com wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same
time doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed
and are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts
but do the actual indexing ( Which should be the CPU consuming part )
later.

I figured out that Elasticsearch would wait for 1 second before
making the documents searchable.
Here , what is it waiting for ? Is it to index the document or
reopening the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there
which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

Re: Insert later feature

2014-02-23 Thread vineeth mohan

Hello Joerg ,

Your config doesnt seem to work.
I gave the following parameter and while i was doing some inserts , there
was no unusual behavior. The head showed the total number of documents i
had inserted and it was searchable.

index.translog.disable_flush : true

ES version - 0.90.9

Is there something i missed out ?

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:12 AM, vineeth mohan vm.vineethmo...@gmail.comwrote:

Hello Joerg ,

I was still thinking how well will this handle cases where i have like 10
Million to insert in the translog and i ask ES to index them all in a
single flush.
Is a heap dump likely to happen.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:08 AM, vineeth mohan
vm.vineethmo...@gmail.comwrote:

Hello Joerg ,

So if i disable it , ES wont write the feeds to lucene until i make a
manual flush...
I believe translog is written to a file and its not resident in the
memory.
This also means that translogs are maintained between restarts and we
will never loose data.

If all the above are right , then this might be a good candidate for my
purpose.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 12:54 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Oops, the correct parameter is index.translog.disable_flush : true

index.gateway.local.flush: -1 is controlling the gateway.

Jörg

On Sun, Feb 23, 2014 at 8:21 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Yes, it is possible to disable the translog sync (the component where
the operations are passed from ES to Lucene) with
index.gateway.local.flush: -1 and use the flush action for manual commit
instead.

I have never done that practically, though.

Jörg

On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan
vm.vineethmo...@gmail.com wrote:

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i
am not sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index
and index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick
michael.s...@serenesoftware.com wrote:

Also, if there are no other clients wanting a faster refresh, you can
set index.refresh_interval to a higher value than the 1s default either
in
general for your index or just during the times when you're doing your
bulk
updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Best method to achieve this would be to implement this in front of
ES so the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate
the two phases of gathering documents and indexing documents. So, by
giving
a scheduling option, it will be possible to index (or even reindex)
gathered documents at a later time, for example, documents are
continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected
documents
will be stored in an archive format at each gatherer node, like the
archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan
vm.vineethmo...@gmail.com wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same
time doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed
and are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk
inserts but do the actual indexing ( Which should be the CPU consuming
part
) later.

I figured out that Elasticsearch would wait for 1 second before
making the documents searchable.
Here , what is it waiting for ? Is it to index the document or
reopening the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out
there which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are

Re: Insert later feature

2014-02-23 Thread vineeth mohan

Hi ,

I tried the below too without any luck -

curl -XPUT 'localhost:9200/documents/_settings' -d '{
index : {
translog : {
disable_flush : true
}
}
}
'

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:42 AM, vineeth mohan vm.vineethmo...@gmail.comwrote:

Hello Joerg ,

index.translog.disable_flush : true

ES version - 0.90.9

Is there something i missed out ?

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:12 AM, vineeth mohan
vm.vineethmo...@gmail.comwrote:

Hello Joerg ,

I was still thinking how well will this handle cases where i have like 10
Million to insert in the translog and i ask ES to index them all in a
single flush.
Is a heap dump likely to happen.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:08 AM, vineeth mohan vm.vineethmo...@gmail.com
wrote:

Hello Joerg ,

So if i disable it , ES wont write the feeds to lucene until i make a
manual flush...
I believe translog is written to a file and its not resident in the
memory.
This also means that translogs are maintained between restarts and we
will never loose data.

If all the above are right , then this might be a good candidate for my
purpose.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 12:54 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Oops, the correct parameter is index.translog.disable_flush : true

index.gateway.local.flush: -1 is controlling the gateway.

Jörg

On Sun, Feb 23, 2014 at 8:21 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Yes, it is possible to disable the translog sync (the component where
the operations are passed from ES to Lucene) with
index.gateway.local.flush: -1 and use the flush action for manual commit
instead.

I have never done that practically, though.

Jörg

On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan
vm.vineethmo...@gmail.com wrote:

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i
am not sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index
and index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick
michael.s...@serenesoftware.com wrote:

Also, if there are no other clients wanting a faster refresh, you
can set index.refresh_interval to a higher value than the 1s default
either
in general for your index or just during the times when you're doing
your
bulk updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Best method to achieve this would be to implement this in front of
ES so the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate
the two phases of gathering documents and indexing documents. So, by
giving
a scheduling option, it will be possible to index (or even reindex)
gathered documents at a later time, for example, documents are
continuously
collected from various sources, like JDBC, web, or file system, and
then
indexed at some later time (for example at night). Such collected
documents
will be stored in an archive format at each gatherer node, like the
archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan
vm.vineethmo...@gmail.com wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same
time doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets
indexed and are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk
inserts but do the actual indexing ( Which should be the CPU
consuming part
) later.

I figured out that Elasticsearch would wait for 1 second before
making the documents searchable.
Here , what is it waiting for ? Is it to index the document or
reopening the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out
there which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it,
send an email to

Re: Insert later feature

2014-02-23 Thread joergpra...@gmail.com

It's not a dynamic setting, afaik.

Sorry, I don't know for sure how a translog can grow forever.

For my purposes, I decided to handle the challenge in front of ES, with
better timing control, and archive files for replay I can use outside of ES
too.

Jörg

On Sun, Feb 23, 2014 at 9:21 PM, vineeth mohan vm.vineethmo...@gmail.comwrote:

Hi ,

I tried the below too without any luck -

curl -XPUT 'localhost:9200/documents/_settings' -d '{
index : {
translog : {
disable_flush : true
}
}
}
'

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:42 AM, vineeth mohan
vm.vineethmo...@gmail.comwrote:

Hello Joerg ,

Your config doesnt seem to work.
I gave the following parameter and while i was doing some inserts ,
there was no unusual behavior. The head showed the total number of
documents i had inserted and it was searchable.

index.translog.disable_flush : true

ES version - 0.90.9

Is there something i missed out ?

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:12 AM, vineeth mohan vm.vineethmo...@gmail.com
wrote:

Hello Joerg ,

I was still thinking how well will this handle cases where i have like
10 Million to insert in the translog and i ask ES to index them all in a
single flush.
Is a heap dump likely to happen.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:08 AM, vineeth mohan
vm.vineethmo...@gmail.com wrote:

Hello Joerg ,

So if i disable it , ES wont write the feeds to lucene until i make a
manual flush...
I believe translog is written to a file and its not resident in the
memory.
This also means that translogs are maintained between restarts and we
will never loose data.

If all the above are right , then this might be a good candidate for my
purpose.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 12:54 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Oops, the correct parameter is index.translog.disable_flush : true

index.gateway.local.flush: -1 is controlling the gateway.

Jörg

On Sun, Feb 23, 2014 at 8:21 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Yes, it is possible to disable the translog sync (the component where
the operations are passed from ES to Lucene) with
index.gateway.local.flush: -1 and use the flush action for manual
commit
instead.

I have never done that practically, though.

Jörg

On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan
vm.vineethmo...@gmail.com wrote:

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i
am not sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to
index and index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick
michael.s...@serenesoftware.com wrote:

Also, if there are no other clients wanting a faster refresh, you
can set index.refresh_interval to a higher value than the 1s default
either
in general for your index or just during the times when you're doing
your
bulk updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Best method to achieve this would be to implement this in front of
ES so the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate
the two phases of gathering documents and indexing documents. So, by
giving
a scheduling option, it will be possible to index (or even reindex)
gathered documents at a later time, for example, documents are
continuously
collected from various sources, like JDBC, web, or file system, and
then
indexed at some later time (for example at night). Such collected
documents
will be stored in an archive format at each gatherer node, like the
archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan
vm.vineethmo...@gmail.com wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the
same time doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets
indexed and are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk
inserts but do the actual indexing ( Which should be the CPU
consuming part
) later.

I figured out that Elasticsearch would wait for 1 second before
making the documents searchable.
Here , what is it waiting for ? Is it to index the document or
reopening the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i

Insert later feature

2014-02-22 Thread vineeth mohan

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same time
doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed and are
immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts but
do the actual indexing ( Which should be the CPU consuming part ) later.

I figured out that Elasticsearch would wait for 1 second before making the
documents searchable.
Here , what is it waiting for ? Is it to index the document or reopening
the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there which
can be of any help.

Thanks
  Vineeth

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Insert later feature

Re: Insert later feature

Re: Insert later feature

Re: Insert later feature

Re: Insert later feature

Re: Insert later feature

Re: Insert later feature

Re: Insert later feature

Insert later feature

9 matches

Site Navigation

Mail list logo

Footer information