Re: Insert later feature

2014-02-23 Thread Michael Sick
Also, if there are no other clients wanting a faster refresh, you can
set index.refresh_interval to a higher value than the 1s default either in
general for your index or just during the times when you're doing your bulk
updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html


On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 Best method to achieve this would be to implement this in front of ES so
 the bulk indexing client runs only at the time it should run.

 For the gathering plugin which I am working on, I plan to separate the two
 phases of gathering documents and indexing documents. So, by giving a
 scheduling option, it will be possible to index (or even reindex) gathered
 documents at a later time, for example, documents are continuously
 collected from various sources, like JDBC, web, or file system, and then
 indexed at some later time (for example at night). Such collected documents
 will be stored in an archive format at each gatherer node, like the archive
 formats supported in the knapsack plugin.

 Jörg



 On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan 
 vm.vineethmo...@gmail.comwrote:

 Hi ,

 I am doing a lot of bulk insert into Elasticsearch and at the same time
 doing lots of read in another index.

 Because of the bulk insert my searches on other index are slow.

 It is not very urgent that these bulk indexes actually gets indexed and
 are immediately searchable.

 Is there anyway , I can ask Elasticsearch to receive the bulk inserts but
 do the actual indexing ( Which should be the CPU consuming part ) later.

 I figured out that Elasticsearch would wait for 1 second before making
 the documents searchable.
 Here , what is it waiting for ? Is it to index the document or reopening
 the indexWriter ?
 Will it help me if i can configure this 1 second to 1 hour ?
 If so , which parameter should i tweak.

 Kindly let me know if there are any other similar features out there
 which can be of any help.

 Thanks
   Vineeth

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Insert later feature

2014-02-23 Thread vineeth mohan
Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i am not
sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index and
index it later , will that do the trick ?

Thanks
 Vineeth


On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick 
michael.s...@serenesoftware.com wrote:

 Also, if there are no other clients wanting a faster refresh, you can
 set index.refresh_interval to a higher value than the 1s default either in
 general for your index or just during the times when you're doing your bulk
 updates.
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html


 On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Best method to achieve this would be to implement this in front of ES so
 the bulk indexing client runs only at the time it should run.

 For the gathering plugin which I am working on, I plan to separate the
 two phases of gathering documents and indexing documents. So, by giving a
 scheduling option, it will be possible to index (or even reindex) gathered
 documents at a later time, for example, documents are continuously
 collected from various sources, like JDBC, web, or file system, and then
 indexed at some later time (for example at night). Such collected documents
 will be stored in an archive format at each gatherer node, like the archive
 formats supported in the knapsack plugin.

 Jörg



 On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan vm.vineethmo...@gmail.com
  wrote:

 Hi ,

 I am doing a lot of bulk insert into Elasticsearch and at the same time
 doing lots of read in another index.

 Because of the bulk insert my searches on other index are slow.

 It is not very urgent that these bulk indexes actually gets indexed and
 are immediately searchable.

 Is there anyway , I can ask Elasticsearch to receive the bulk inserts
 but do the actual indexing ( Which should be the CPU consuming part ) later.

 I figured out that Elasticsearch would wait for 1 second before making
 the documents searchable.
 Here , what is it waiting for ? Is it to index the document or reopening
 the indexWriter ?
 Will it help me if i can configure this 1 second to 1 hour ?
 If so , which parameter should i tweak.

 Kindly let me know if there are any other similar features out there
 which can be of any help.

 Thanks
   Vineeth

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMvTYj0amH46nkm%3DkAEZ6HS2yaAYX5fadS7vaY6cmRvw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Insert later feature

2014-02-23 Thread joergpra...@gmail.com
Yes, it is possible to disable the translog sync (the component where the
operations are passed from ES to Lucene) with index.gateway.local.flush: -1
and use the flush action for manual commit instead.

I have never done that practically, though.

Jörg



On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan vm.vineethmo...@gmail.comwrote:

 Hello Michael - Thanks for the configuration.

 Hello Jörg - I was thinking more in lines of translog -
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

 I believe the index operation is first written to translog ( Which i am
 not sure if is a part of lucene ) and then written to lucene later.
 Here if we can ask ES , to accumulate a huge amount of feeds to index and
 index it later , will that do the trick ?

 Thanks
  Vineeth


 On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick 
 michael.s...@serenesoftware.com wrote:

 Also, if there are no other clients wanting a faster refresh, you can
 set index.refresh_interval to a higher value than the 1s default either in
 general for your index or just during the times when you're doing your bulk
 updates.
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html


 On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Best method to achieve this would be to implement this in front of ES so
 the bulk indexing client runs only at the time it should run.

 For the gathering plugin which I am working on, I plan to separate the
 two phases of gathering documents and indexing documents. So, by giving a
 scheduling option, it will be possible to index (or even reindex) gathered
 documents at a later time, for example, documents are continuously
 collected from various sources, like JDBC, web, or file system, and then
 indexed at some later time (for example at night). Such collected documents
 will be stored in an archive format at each gatherer node, like the archive
 formats supported in the knapsack plugin.

 Jörg



 On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan 
 vm.vineethmo...@gmail.com wrote:

 Hi ,

 I am doing a lot of bulk insert into Elasticsearch and at the same time
 doing lots of read in another index.

 Because of the bulk insert my searches on other index are slow.

 It is not very urgent that these bulk indexes actually gets indexed and
 are immediately searchable.

 Is there anyway , I can ask Elasticsearch to receive the bulk inserts
 but do the actual indexing ( Which should be the CPU consuming part ) 
 later.

 I figured out that Elasticsearch would wait for 1 second before making
 the documents searchable.
 Here , what is it waiting for ? Is it to index the document or
 reopening the indexWriter ?
 Will it help me if i can configure this 1 second to 1 hour ?
 If so , which parameter should i tweak.

 Kindly let me know if there are any other similar features out there
 which can be of any help.

 Thanks
   Vineeth

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMvTYj0amH46nkm%3DkAEZ6HS2yaAYX5fadS7vaY6cmRvw%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this 

Re: Insert later feature

2014-02-23 Thread joergpra...@gmail.com
Oops, the correct parameter is index.translog.disable_flush : true

index.gateway.local.flush: -1 is controlling the gateway.

Jörg


On Sun, Feb 23, 2014 at 8:21 PM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 Yes, it is possible to disable the translog sync (the component where the
 operations are passed from ES to Lucene) with index.gateway.local.flush: -1
 and use the flush action for manual commit instead.

 I have never done that practically, though.

 Jörg



 On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan 
 vm.vineethmo...@gmail.comwrote:

 Hello Michael - Thanks for the configuration.

 Hello Jörg - I was thinking more in lines of translog -
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

 I believe the index operation is first written to translog ( Which i am
 not sure if is a part of lucene ) and then written to lucene later.
 Here if we can ask ES , to accumulate a huge amount of feeds to index and
 index it later , will that do the trick ?

 Thanks
  Vineeth


 On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick 
 michael.s...@serenesoftware.com wrote:

 Also, if there are no other clients wanting a faster refresh, you can
 set index.refresh_interval to a higher value than the 1s default either in
 general for your index or just during the times when you're doing your bulk
 updates.
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html


 On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Best method to achieve this would be to implement this in front of ES
 so the bulk indexing client runs only at the time it should run.

 For the gathering plugin which I am working on, I plan to separate the
 two phases of gathering documents and indexing documents. So, by giving a
 scheduling option, it will be possible to index (or even reindex) gathered
 documents at a later time, for example, documents are continuously
 collected from various sources, like JDBC, web, or file system, and then
 indexed at some later time (for example at night). Such collected documents
 will be stored in an archive format at each gatherer node, like the archive
 formats supported in the knapsack plugin.

 Jörg



 On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan 
 vm.vineethmo...@gmail.com wrote:

 Hi ,

 I am doing a lot of bulk insert into Elasticsearch and at the same
 time doing lots of read in another index.

 Because of the bulk insert my searches on other index are slow.

 It is not very urgent that these bulk indexes actually gets indexed
 and are immediately searchable.

 Is there anyway , I can ask Elasticsearch to receive the bulk inserts
 but do the actual indexing ( Which should be the CPU consuming part ) 
 later.

 I figured out that Elasticsearch would wait for 1 second before making
 the documents searchable.
 Here , what is it waiting for ? Is it to index the document or
 reopening the indexWriter ?
 Will it help me if i can configure this 1 second to 1 hour ?
 If so , which parameter should i tweak.

 Kindly let me know if there are any other similar features out there
 which can be of any help.

 Thanks
   Vineeth

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 

Re: Insert later feature

2014-02-23 Thread vineeth mohan
Hello Joerg ,

So if i disable it , ES wont write the feeds to lucene until i make a
manual flush...
I believe translog is written to a file and its not resident in the memory.
This also means that translogs are maintained between restarts and we will
never loose data.

If all the above are right , then this might be a good candidate for my
purpose.

Thanks
   Vineeth


On Mon, Feb 24, 2014 at 12:54 AM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 Oops, the correct parameter is index.translog.disable_flush : true

 index.gateway.local.flush: -1 is controlling the gateway.

 Jörg


 On Sun, Feb 23, 2014 at 8:21 PM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Yes, it is possible to disable the translog sync (the component where the
 operations are passed from ES to Lucene) with index.gateway.local.flush: -1
 and use the flush action for manual commit instead.

 I have never done that practically, though.

 Jörg



 On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan vm.vineethmo...@gmail.com
  wrote:

 Hello Michael - Thanks for the configuration.

 Hello Jörg - I was thinking more in lines of translog -
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

 I believe the index operation is first written to translog ( Which i am
 not sure if is a part of lucene ) and then written to lucene later.
 Here if we can ask ES , to accumulate a huge amount of feeds to index
 and index it later , will that do the trick ?

 Thanks
  Vineeth


 On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick 
 michael.s...@serenesoftware.com wrote:

 Also, if there are no other clients wanting a faster refresh, you can
 set index.refresh_interval to a higher value than the 1s default either in
 general for your index or just during the times when you're doing your bulk
 updates.
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html


 On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Best method to achieve this would be to implement this in front of ES
 so the bulk indexing client runs only at the time it should run.

 For the gathering plugin which I am working on, I plan to separate the
 two phases of gathering documents and indexing documents. So, by giving a
 scheduling option, it will be possible to index (or even reindex) gathered
 documents at a later time, for example, documents are continuously
 collected from various sources, like JDBC, web, or file system, and then
 indexed at some later time (for example at night). Such collected 
 documents
 will be stored in an archive format at each gatherer node, like the 
 archive
 formats supported in the knapsack plugin.

 Jörg



 On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan 
 vm.vineethmo...@gmail.com wrote:

 Hi ,

 I am doing a lot of bulk insert into Elasticsearch and at the same
 time doing lots of read in another index.

 Because of the bulk insert my searches on other index are slow.

 It is not very urgent that these bulk indexes actually gets indexed
 and are immediately searchable.

 Is there anyway , I can ask Elasticsearch to receive the bulk inserts
 but do the actual indexing ( Which should be the CPU consuming part ) 
 later.

 I figured out that Elasticsearch would wait for 1 second before
 making the documents searchable.
 Here , what is it waiting for ? Is it to index the document or
 reopening the indexWriter ?
 Will it help me if i can configure this 1 second to 1 hour ?
 If so , which parameter should i tweak.

 Kindly let me know if there are any other similar features out there
 which can be of any help.

 Thanks
   Vineeth

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 

Re: Insert later feature

2014-02-23 Thread vineeth mohan
Hello Joerg ,

Your config doesnt seem to work.
I gave the following parameter and while i  was doing some inserts , there
was no unusual behavior. The head showed the total number of documents i
had inserted and it was searchable.

index.translog.disable_flush : true

ES version - 0.90.9

Is there something i missed out ?

Thanks
Vineeth


On Mon, Feb 24, 2014 at 1:12 AM, vineeth mohan vm.vineethmo...@gmail.comwrote:

 Hello Joerg ,

 I was still thinking how well will this handle cases where i have like 10
 Million to insert in the translog and  i ask ES to index them all in a
 single flush.
 Is a heap dump likely to happen.

 Thanks
Vineeth


 On Mon, Feb 24, 2014 at 1:08 AM, vineeth mohan 
 vm.vineethmo...@gmail.comwrote:

 Hello Joerg ,

 So if i disable it , ES wont write the feeds to lucene until i make a
 manual flush...
 I believe translog is written to a file and its not resident in the
 memory.
 This also means that translogs are maintained between restarts and we
 will never loose data.

 If all the above are right , then this might be a good candidate for my
 purpose.

 Thanks
Vineeth


 On Mon, Feb 24, 2014 at 12:54 AM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Oops, the correct parameter is index.translog.disable_flush : true

 index.gateway.local.flush: -1 is controlling the gateway.

 Jörg


 On Sun, Feb 23, 2014 at 8:21 PM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Yes, it is possible to disable the translog sync (the component where
 the operations are passed from ES to Lucene) with
 index.gateway.local.flush: -1 and use the flush action for manual commit
 instead.

 I have never done that practically, though.

 Jörg



 On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan 
 vm.vineethmo...@gmail.com wrote:

 Hello Michael - Thanks for the configuration.

 Hello Jörg - I was thinking more in lines of translog -
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

 I believe the index operation is first written to translog ( Which i
 am not sure if is a part of lucene ) and then written to lucene later.
 Here if we can ask ES , to accumulate a huge amount of feeds to index
 and index it later , will that do the trick ?

 Thanks
  Vineeth


 On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick 
 michael.s...@serenesoftware.com wrote:

 Also, if there are no other clients wanting a faster refresh, you can
 set index.refresh_interval to a higher value than the 1s default either 
 in
 general for your index or just during the times when you're doing your 
 bulk
 updates.
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html


 On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Best method to achieve this would be to implement this in front of
 ES so the bulk indexing client runs only at the time it should run.

 For the gathering plugin which I am working on, I plan to separate
 the two phases of gathering documents and indexing documents. So, by 
 giving
 a scheduling option, it will be possible to index (or even reindex)
 gathered documents at a later time, for example, documents are 
 continuously
 collected from various sources, like JDBC, web, or file system, and then
 indexed at some later time (for example at night). Such collected 
 documents
 will be stored in an archive format at each gatherer node, like the 
 archive
 formats supported in the knapsack plugin.

 Jörg



 On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan 
 vm.vineethmo...@gmail.com wrote:

 Hi ,

 I am doing a lot of bulk insert into Elasticsearch and at the same
 time doing lots of read in another index.

 Because of the bulk insert my searches on other index are slow.

 It is not very urgent that these bulk indexes actually gets indexed
 and are immediately searchable.

 Is there anyway , I can ask Elasticsearch to receive the bulk
 inserts but do the actual indexing ( Which should be the CPU consuming 
 part
 ) later.

 I figured out that Elasticsearch would wait for 1 second before
 making the documents searchable.
 Here , what is it waiting for ? Is it to index the document or
 reopening the indexWriter ?
 Will it help me if i can configure this 1 second to 1 hour ?
 If so , which parameter should i tweak.

 Kindly let me know if there are any other similar features out
 there which can be of any help.

 Thanks
   Vineeth

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are 

Re: Insert later feature

2014-02-23 Thread vineeth mohan
Hi ,

I tried the below too without any luck -

curl -XPUT 'localhost:9200/documents/_settings' -d '{
index : {
translog : {
disable_flush : true
}
 }
}
'

Thanks
   Vineeth


On Mon, Feb 24, 2014 at 1:42 AM, vineeth mohan vm.vineethmo...@gmail.comwrote:

 Hello Joerg ,

 Your config doesnt seem to work.
 I gave the following parameter and while i  was doing some inserts , there
 was no unusual behavior. The head showed the total number of documents i
 had inserted and it was searchable.

 index.translog.disable_flush : true

 ES version - 0.90.9

 Is there something i missed out ?

 Thanks
 Vineeth


 On Mon, Feb 24, 2014 at 1:12 AM, vineeth mohan 
 vm.vineethmo...@gmail.comwrote:

 Hello Joerg ,

 I was still thinking how well will this handle cases where i have like 10
 Million to insert in the translog and  i ask ES to index them all in a
 single flush.
 Is a heap dump likely to happen.

 Thanks
 Vineeth


 On Mon, Feb 24, 2014 at 1:08 AM, vineeth mohan vm.vineethmo...@gmail.com
  wrote:

 Hello Joerg ,

 So if i disable it , ES wont write the feeds to lucene until i make a
 manual flush...
 I believe translog is written to a file and its not resident in the
 memory.
 This also means that translogs are maintained between restarts and we
 will never loose data.

 If all the above are right , then this might be a good candidate for my
 purpose.

 Thanks
Vineeth


 On Mon, Feb 24, 2014 at 12:54 AM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Oops, the correct parameter is index.translog.disable_flush : true

 index.gateway.local.flush: -1 is controlling the gateway.

 Jörg


 On Sun, Feb 23, 2014 at 8:21 PM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Yes, it is possible to disable the translog sync (the component where
 the operations are passed from ES to Lucene) with
 index.gateway.local.flush: -1 and use the flush action for manual commit
 instead.

 I have never done that practically, though.

 Jörg



 On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan 
 vm.vineethmo...@gmail.com wrote:

 Hello Michael - Thanks for the configuration.

 Hello Jörg - I was thinking more in lines of translog -
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

 I believe the index operation is first written to translog ( Which i
 am not sure if is a part of lucene ) and then written to lucene later.
 Here if we can ask ES , to accumulate a huge amount of feeds to index
 and index it later , will that do the trick ?

 Thanks
  Vineeth


 On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick 
 michael.s...@serenesoftware.com wrote:

 Also, if there are no other clients wanting a faster refresh, you
 can set index.refresh_interval to a higher value than the 1s default 
 either
 in general for your index or just during the times when you're doing 
 your
 bulk updates.
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html


 On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Best method to achieve this would be to implement this in front of
 ES so the bulk indexing client runs only at the time it should run.

 For the gathering plugin which I am working on, I plan to separate
 the two phases of gathering documents and indexing documents. So, by 
 giving
 a scheduling option, it will be possible to index (or even reindex)
 gathered documents at a later time, for example, documents are 
 continuously
 collected from various sources, like JDBC, web, or file system, and 
 then
 indexed at some later time (for example at night). Such collected 
 documents
 will be stored in an archive format at each gatherer node, like the 
 archive
 formats supported in the knapsack plugin.

 Jörg



 On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan 
 vm.vineethmo...@gmail.com wrote:

 Hi ,

 I am doing a lot of bulk insert into Elasticsearch and at the same
 time doing lots of read in another index.

 Because of the bulk insert my searches on other index are slow.

 It is not very urgent that these bulk indexes actually gets
 indexed and are immediately searchable.

 Is there anyway , I can ask Elasticsearch to receive the bulk
 inserts but do the actual indexing ( Which should be the CPU 
 consuming part
 ) later.

 I figured out that Elasticsearch would wait for 1 second before
 making the documents searchable.
 Here , what is it waiting for ? Is it to index the document or
 reopening the indexWriter ?
 Will it help me if i can configure this 1 second to 1 hour ?
 If so , which parameter should i tweak.

 Kindly let me know if there are any other similar features out
 there which can be of any help.

 Thanks
   Vineeth

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to 

Re: Insert later feature

2014-02-23 Thread joergpra...@gmail.com
It's not a dynamic setting, afaik.

Sorry, I don't know for sure how a translog can grow forever.

For my purposes, I decided to handle the challenge in front of ES, with
better timing control, and archive files for replay I can use outside of ES
too.

Jörg



On Sun, Feb 23, 2014 at 9:21 PM, vineeth mohan vm.vineethmo...@gmail.comwrote:

 Hi ,

 I tried the below too without any luck -

 curl -XPUT 'localhost:9200/documents/_settings' -d '{
 index : {
  translog : {
 disable_flush : true
 }
  }
 }
 '

 Thanks
Vineeth


 On Mon, Feb 24, 2014 at 1:42 AM, vineeth mohan 
 vm.vineethmo...@gmail.comwrote:

 Hello Joerg ,

 Your config doesnt seem to work.
 I gave the following parameter and while i  was doing some inserts ,
 there was no unusual behavior. The head showed the total number of
 documents i had inserted and it was searchable.

 index.translog.disable_flush : true

 ES version - 0.90.9

 Is there something i missed out ?

 Thanks
 Vineeth


 On Mon, Feb 24, 2014 at 1:12 AM, vineeth mohan vm.vineethmo...@gmail.com
  wrote:

 Hello Joerg ,

 I was still thinking how well will this handle cases where i have like
 10 Million to insert in the translog and  i ask ES to index them all in a
 single flush.
 Is a heap dump likely to happen.

 Thanks
 Vineeth


 On Mon, Feb 24, 2014 at 1:08 AM, vineeth mohan 
 vm.vineethmo...@gmail.com wrote:

 Hello Joerg ,

 So if i disable it , ES wont write the feeds to lucene until i make a
 manual flush...
 I believe translog is written to a file and its not resident in the
 memory.
 This also means that translogs are maintained between restarts and we
 will never loose data.

 If all the above are right , then this might be a good candidate for my
 purpose.

 Thanks
Vineeth


 On Mon, Feb 24, 2014 at 12:54 AM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Oops, the correct parameter is index.translog.disable_flush : true

 index.gateway.local.flush: -1 is controlling the gateway.

 Jörg


 On Sun, Feb 23, 2014 at 8:21 PM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Yes, it is possible to disable the translog sync (the component where
 the operations are passed from ES to Lucene) with
 index.gateway.local.flush: -1 and use the flush action for manual 
 commit
 instead.

 I have never done that practically, though.

 Jörg



 On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan 
 vm.vineethmo...@gmail.com wrote:

 Hello Michael - Thanks for the configuration.

 Hello Jörg - I was thinking more in lines of translog -
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

 I believe the index operation is first written to translog ( Which i
 am not sure if is a part of lucene ) and then written to lucene later.
 Here if we can ask ES , to accumulate a huge amount of feeds to
 index and index it later , will that do the trick ?

 Thanks
  Vineeth


 On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick 
 michael.s...@serenesoftware.com wrote:

 Also, if there are no other clients wanting a faster refresh, you
 can set index.refresh_interval to a higher value than the 1s default 
 either
 in general for your index or just during the times when you're doing 
 your
 bulk updates.
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html


 On Sun, Feb 23, 2014 at 8:28 AM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 Best method to achieve this would be to implement this in front of
 ES so the bulk indexing client runs only at the time it should run.

 For the gathering plugin which I am working on, I plan to separate
 the two phases of gathering documents and indexing documents. So, by 
 giving
 a scheduling option, it will be possible to index (or even reindex)
 gathered documents at a later time, for example, documents are 
 continuously
 collected from various sources, like JDBC, web, or file system, and 
 then
 indexed at some later time (for example at night). Such collected 
 documents
 will be stored in an archive format at each gatherer node, like the 
 archive
 formats supported in the knapsack plugin.

 Jörg



 On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan 
 vm.vineethmo...@gmail.com wrote:

 Hi ,

 I am doing a lot of bulk insert into Elasticsearch and at the
 same time doing lots of read in another index.

 Because of the bulk insert my searches on other index are slow.

 It is not very urgent that these bulk indexes actually gets
 indexed and are immediately searchable.

 Is there anyway , I can ask Elasticsearch to receive the bulk
 inserts but do the actual indexing ( Which should be the CPU 
 consuming part
 ) later.

 I figured out that Elasticsearch would wait for 1 second before
 making the documents searchable.
 Here , what is it waiting for ? Is it to index the document or
 reopening the indexWriter ?
 Will it help me if i can configure this 1 second to 1 hour ?
 If so , which parameter should i 

Insert later feature

2014-02-22 Thread vineeth mohan
Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same time
doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed and are
immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts but
do the actual indexing ( Which should be the CPU consuming part ) later.

I figured out that Elasticsearch would wait for 1 second before making the
documents searchable.
Here , what is it waiting for ? Is it to index the document or reopening
the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there which
can be of any help.

Thanks
  Vineeth

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.