Re: Question about time based indexes/rolling indexes and eviction policies?

2014-05-26 Thread Magnus Bäck
On Friday, May 23, 2014 at 20:13 CEST,
 John Smith java.dev@gmail.com wrote:

 #1
 I have been reading around and some people suggest if doing log
 analytics to split the index based on time.
 Is this built in into Elastic search or does it mean I have to do it
 manual?

I don't believe Elasticsearch itself understands date-based indices,
but Logstash does.

 If manual
 PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id
 I'm pulling my data from SQL server and going to either use ETL or
 JDBC gatherer. I suppose the ETL process needs to consider the date
 and when it does it's index PUT to check and roll over the date so
 that a new index gets created?

Yes.

 And my queries need to consider this also so they know that on each
 day they need to search the new index?

Yes, unless you use an index alias like _all to search in all indices
but that obviously has performance implication and in part voids the
benefits of multiple indices.

 #2 is there such a thing as eviction policies?
 Basically is there a way to check if we are running out of diskspace
 and to either remove entries from the index or in the above case
 delete/archive indexes older then a few days?

If disk space is your limiting factor you should find the curator
script useful. You could also set the _ttl value of messages to have
them automatically expire after a set time.

https://github.com/elasticsearch/curator
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-ttl-field.html

-- 
Magnus Bäck| Software Engineer, Development Tools
magnus.b...@sonymobile.com | Sony Mobile Communications

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20140526063906.GB16396%40seldlx20533.corpusers.net.
For more options, visit https://groups.google.com/d/optout.


Re: Question about time based indexes/rolling indexes and eviction policies?

2014-05-26 Thread joergpra...@gmail.com
1. I will add a timeseries mode to my JDBC plugin soon. Right now you can
create timestamps with bash (or your favorite shell) and append it as a
suffix to the index name into the river/feeder creation call, but this can
be automated. No ETA yet.

2. This is also a nifty feature, I will experiment with the JDBC plugin if
I can estimate the data volume to index (probably from the data volume of
previous runs) or if I can make an educated guess about data growth in ES
data folders, and will refuse to continue if a limit is exceeded. Index
data volume can fluctuate due to segment creations and merging so this
would have to include an optimization strategy, or I rely on the JDBC
source.

Eviction is a harder topic, since I hesitate to create a plugin that can
delete data without user interaction. Even eviction rules in a plugin
configuration may contain mistakes and are risky. But I also see the
usefulness of obsoleting indexed data by dropping them regularly. I don't
want to take responsibility for this in the JDBC plugin, so this may just
be another plugin implementation.

Jörg


On Fri, May 23, 2014 at 8:13 PM, John Smith java.dev@gmail.com wrote:

 #1
 I have been reading around and some people suggest if doing log
 analytics to split the index based on time.
 Is this built in into Elastic search or does it mean I have to do it
 manual?

 If manual

 PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id

 I'm pulling my data from SQL server and going to either use ETL or JDBC
 gatherer. I suppose the ETL process needs to consider the date and when it
 does it's index PUT to check and roll over the date so that a new index
 gets created?
 And my queries need to consider this also so they know that on each day
 they need to search the new index?

 #2 is there such a thing as eviction policies?
 Basically is there a way to check if we are running out of diskspace and
 to either remove entries from the index or in the above case delete/archive
 indexes older then a few days?





  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG-GdoHbLX0%2BCVj8jjBXQxQQjAnZzZkY90T2jnHAYT1HA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Question about time based indexes/rolling indexes and eviction policies?

2014-05-26 Thread John Smith
Thanks!

On Monday, 26 May 2014 03:58:15 UTC-4, Jörg Prante wrote:

 1. I will add a timeseries mode to my JDBC plugin soon. Right now you can 
 create timestamps with bash (or your favorite shell) and append it as a 
 suffix to the index name into the river/feeder creation call, but this can 
 be automated. No ETA yet.

 2. This is also a nifty feature, I will experiment with the JDBC plugin if 
 I can estimate the data volume to index (probably from the data volume of 
 previous runs) or if I can make an educated guess about data growth in ES 
 data folders, and will refuse to continue if a limit is exceeded. Index 
 data volume can fluctuate due to segment creations and merging so this 
 would have to include an optimization strategy, or I rely on the JDBC 
 source. 

 Eviction is a harder topic, since I hesitate to create a plugin that can 
 delete data without user interaction. Even eviction rules in a plugin 
 configuration may contain mistakes and are risky. But I also see the 
 usefulness of obsoleting indexed data by dropping them regularly. I don't 
 want to take responsibility for this in the JDBC plugin, so this may just 
 be another plugin implementation.

 Jörg


 On Fri, May 23, 2014 at 8:13 PM, John Smith java.d...@gmail.comjavascript:
  wrote:

 #1
 I have been reading around and some people suggest if doing log 
 analytics to split the index based on time.
 Is this built in into Elastic search or does it mean I have to do it 
 manual?

 If manual

 PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id

 I'm pulling my data from SQL server and going to either use ETL or JDBC 
 gatherer. I suppose the ETL process needs to consider the date and when it 
 does it's index PUT to check and roll over the date so that a new index 
 gets created?
 And my queries need to consider this also so they know that on each day 
 they need to search the new index?

 #2 is there such a thing as eviction policies?
 Basically is there a way to check if we are running out of diskspace and 
 to either remove entries from the index or in the above case delete/archive 
 indexes older then a few days?





  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f8a28604-993f-44c4-8632-249cd01d29c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Question about time based indexes/rolling indexes and eviction policies?

2014-05-23 Thread John Smith
#1
I have been reading around and some people suggest if doing log analytics 
to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it manual?

If manual

PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id

I'm pulling my data from SQL server and going to either use ETL or JDBC 
gatherer. I suppose the ETL process needs to consider the date and when it 
does it's index PUT to check and roll over the date so that a new index 
gets created?
And my queries need to consider this also so they know that on each day 
they need to search the new index?

#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace and to 
either remove entries from the index or in the above case delete/archive 
indexes older then a few days?





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.