inconsistent paging

2014-08-18 Thread Ron Sher
Hi,

We've noticed a strange behavior in elasticsearch during paging.

In one case we use a paging size of 60 and we have 63 documents. So the
first page is using size 60 and offset 0. The second page is using size 60
and offset 60. What we see is that the result is inconsistent. Meaning, on
the 2nd page, we sometimes get results that were before in the 1st page.

The query we use has an order by some numberic field that has many
documents with the same value (0).
It looks like the ordering between documents according to the same value,
which is 0, isn't consistent.

Did anyone encounter such behavior? Any suggestions on resolving this?

We're using version 1.3.1.

Thanks,
Ron

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKHuyJpcYKepYzh%2BBU2MSD2RQ19zjHYiXgf3anWBL9esq9fkGQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: inconsistent paging

2014-08-18 Thread David Pilato
You need to use scroll if you have that requirement.

See: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html#search-request-scroll

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 18 août 2014 à 08:02, Ron Sher ron.s...@gmail.com a écrit :
 
 Hi,
 
 We've noticed a strange behavior in elasticsearch during paging. 
 
 In one case we use a paging size of 60 and we have 63 documents. So the first 
 page is using size 60 and offset 0. The second page is using size 60 and 
 offset 60. What we see is that the result is inconsistent. Meaning, on the 
 2nd page, we sometimes get results that were before in the 1st page. 
 
 The query we use has an order by some numberic field that has many documents 
 with the same value (0). 
 It looks like the ordering between documents according to the same value, 
 which is 0, isn't consistent. 
 
 Did anyone encounter such behavior? Any suggestions on resolving this? 
 
 We're using version 1.3.1. 
 
 Thanks, 
 Ron
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/CAKHuyJpcYKepYzh%2BBU2MSD2RQ19zjHYiXgf3anWBL9esq9fkGQ%40mail.gmail.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8DAEA97B-687A-44A6-B638-189A49D6310E%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: inconsistent paging

2014-08-18 Thread Adrien Grand
Hi Ron,

The cause of this issue is that Elasticsearch uses Lucene's internal doc
IDs as tie-breakers. Internal doc IDs might be completely different across
replicas of the same data, so this explains why documents that have the
same sort values are not consistently ordered.

There are 2 potential ways to fix that problem:
 1. Use scroll as David mentionned. It will create a context around your
request and will make sure that the same shards will be used for all pages.
However, it also gives another warranty, which is that the same
point-in-time view on the index will be used for each page, and this is
expensive to maintain.
 2. Use a custom string value as a preference in order to always hit the
same shards for a given session[1]. This will help with always hitting the
same shards likely to 1. but without adding the additional cost of a scroll.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html



On Mon, Aug 18, 2014 at 8:02 AM, Ron Sher ron.s...@gmail.com wrote:

 Hi,

 We've noticed a strange behavior in elasticsearch during paging.

 In one case we use a paging size of 60 and we have 63 documents. So the
 first page is using size 60 and offset 0. The second page is using size 60
 and offset 60. What we see is that the result is inconsistent. Meaning,
 on the 2nd page, we sometimes get results that were before in the 1st page.

 The query we use has an order by some numberic field that has many
 documents with the same value (0).
 It looks like the ordering between documents according to the same value,
 which is 0, isn't consistent.

 Did anyone encounter such behavior? Any suggestions on resolving this?

 We're using version 1.3.1.

 Thanks,
 Ron

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKHuyJpcYKepYzh%2BBU2MSD2RQ19zjHYiXgf3anWBL9esq9fkGQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAKHuyJpcYKepYzh%2BBU2MSD2RQ19zjHYiXgf3anWBL9esq9fkGQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7FJofXSpDjHnpMVs1poHFREbrQ9DPnPX4YnjFjUKg_ng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Help with the percentiles aggregation

2014-08-18 Thread Adrien Grand
Hi John,

You should be able to do something like:

{
  aggs: {
verb: {
  terms: {
field: verb
  },
  aggs: {
load_time_outliers: {
  percentiles: {
field: responsetime
  }
}
  }
}
  }
}

This will first break down your documents according to the http verb that
is being used and then compute percentiles separately for each unique verb.



On Fri, Aug 15, 2014 at 11:23 AM, John Ogden johnog65...@gmail.com wrote:

 Hi,

 Am trying to run a single command which calculates percentiles for
 multiple search queries.
 The data for this is an Apache log file, and I want to get the percentile
 response times for the gets, posts, heads (etc) in one go

 If I run this:
 curl -XPOST 'http://localhost:9200/_search?search_type=countpretty=true'
 -d '{
 facets: {
 0: {query : {term : { verb : get  }}},
 1: {query : {term : { verb : post }}}
 },
 aggs : {load_time_outlier : {percentiles : {field :
 responsetime}}}
 }'

 The response I get back has the counts for each subquery but only does the
 aggregations for the overall dataset
   facets : {
 0 : {
   _type : query,
   count : 5678
 },
 1 : {
   _type : query,
   count : 1234
 }
   },
   aggregations : {
 load_time_outlier : {
   values : {
 1.0 : 0.0,
  ...
 99.0 : 1234
   }
 }
   }

 I cant figure out how to structure the request so that I get the
 percentiles separately for each of the queries

 Could someone point me in in the right direction please

 Many thanks
 John

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/9d9696cb-adfa-4812-bd81-5efee0d29032%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/9d9696cb-adfa-4812-bd81-5efee0d29032%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5JwTLK2q10fEKX6bVBzYH69dSRgA2njoEvhhronqfh1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: impact of stored fields on performance

2014-08-18 Thread Adrien Grand
Hi Ashish,

On Thu, Aug 14, 2014 at 12:35 AM, Ashish Mishra laughingbud...@gmail.com
wrote:

 That sounds possible.  We are using spindle disks.  I have ~36Gb free for
 the filesystem cache, and the previous data size (without the added field)
 was 60-65Gb per node.  So it's likely that 50% of queries were previously
 addressed out of the FS cache, even more if queries are unevenly
 distributed.
 Data size is now 200Gb/node.  So only ~18% of queries could hit the cache
 and the rest would incur seek times.

 Hmm... given this knowledge, is there a way to mitigate the effect without
 moving everything to SSD?  Only a minority of queries return the stored
 field and it is not indexed.  Ideally, it would be stored in separate
 (colocated) files from the indexed fields.  That way, most queries would be
 unaffected and only those returning the value incur the seek cost.

 I imagine indexes with _source enabled would see similar effects.

 Is a parent-child relationship a good way to achieve the scenario above?
  The parent can contain indexed fields and the child has stored fields.
 Not sure if this just introduces new problems.


I think that you don't even need parent/child relations for this. If you
identify a few large stored fields that you rarely need, you could store
them in a different index with the same _id and only GET them on demand.


-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j48QpGoV6Gh8ns5SzrABLFmZLMjWx6iEUGea2evx06kAg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: accessing field data faster in script

2014-08-18 Thread Adrien Grand
Script filters are inherently slow due to the fact that they cannot
leverage the inverted index in order to skip efficiently over non-matching
documents. Even if they were written in assembly, this would likely still
be slow.

What kind of filtering are you trying to do with scripts?


On Thu, Aug 14, 2014 at 8:42 AM, avacados kotadia.ak...@gmail.com wrote:

 How to access field data faster from native (java) script ??? should i
 enable 'doc values'?

 I am already using doc().getField() and casting to long. It is date field
 type. But whenever, my argument to script changes, it has poor performance
 for search query. Subsequent call with same argument has good performance.
 (might be because _cache is true for that script filter.)

 Thanks.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/afd89e62-0773-4684-904d-53805d9d7358%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/afd89e62-0773-4684-904d-53805d9d7358%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5MH4Pw_sLy9M7Tr01gH0L-QQbRfXQSQZg7iYrFT_EQtA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Return selected fields from aggregation?

2014-08-18 Thread Adrien Grand
Can you elaborate more on what you are after?


On Wed, Aug 13, 2014 at 5:16 PM, project2501 darreng5...@gmail.com wrote:

 The old facet DSL was very nice and easy to understand. I could declare
 only which fields I wanted returned.

 how is this done with aggregations? The docs do not say.

 I am only interested in the aggregation metrics not all the document
 results.

 I tried setting size:0 but that DOES NOT EVEN WORK.

 Any help appreciated.

 Thank you,
 D

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/2c845104-59c0-4ea4-90a6-551a93dc3f99%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/2c845104-59c0-4ea4-90a6-551a93dc3f99%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4m%3DLXao3%3Df%2BgV5uMXO_nLhLq-7fe-JcCh1oKhp2f8jYg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


inconsistent paging

2014-08-18 Thread ronsher
We've noticed a strange behavior in elasticsearch during paging.

In one case we use a paging size of 60 and we have 63 documents. So the
first page is using size 60 and offset 0. The second page is using size 60
and offset 60.

What we see is that the result is inconsistent. Meaning, on the 2nd page, we
sometimes get results that were before in the 1st page.

The query we use has an order by some numberic field that has many documents
with the same value (0). 
It looks like the ordering between documents according to the same value,
which is 0, isn't consistent.

Did anyone encounter such behavior? Any suggestions on resolving this?

We're using version 1.3.1.

Thanks,
Ron



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/inconsistent-paging-tp4061986.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1408336696765-4061986.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Access to AbstractAggregationBuilder.name

2014-08-18 Thread Adrien Grand
Hi Phil,

We would indeed consider a PR for that change if it makes things easier to
you. Feel free to ping me when you open it so that I don't miss it.


On Wed, Aug 13, 2014 at 3:55 PM, Phil Wills otherp...@gmail.com wrote:

 Hello,

 In the Java API AbstractAggregationBuilder's name property is protected.
 Is there a particular reason it can't be public, or have an accessor added,
 or is this something you'd consider a PR for?

 Not having access is making things more complicated than I'd like.

 Thanks,

 Phil

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/63c1ae3f-ff37-4f47-9147-037d19ff9eec%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/63c1ae3f-ff37-4f47-9147-037d19ff9eec%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6kGnOjSHYMQD%3DtEsJvaMuGps3qBHyaTHGQk%3DemKbXGvA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: inconsistent paging

2014-08-18 Thread vineeth mohan
You have asked teh same question from another GMAIL ID.
Please refer to the answers over there.

Thanks
   Vineeth


On Mon, Aug 18, 2014 at 10:08 AM, ronsher rons...@gmail.com wrote:

 We've noticed a strange behavior in elasticsearch during paging.

 In one case we use a paging size of 60 and we have 63 documents. So the
 first page is using size 60 and offset 0. The second page is using size 60
 and offset 60.

 What we see is that the result is inconsistent. Meaning, on the 2nd page,
 we
 sometimes get results that were before in the 1st page.

 The query we use has an order by some numberic field that has many
 documents
 with the same value (0).
 It looks like the ordering between documents according to the same value,
 which is 0, isn't consistent.

 Did anyone encounter such behavior? Any suggestions on resolving this?

 We're using version 1.3.1.

 Thanks,
 Ron



 --
 View this message in context:
 http://elasticsearch-users.115913.n3.nabble.com/inconsistent-paging-tp4061986.html
 Sent from the ElasticSearch Users mailing list archive at Nabble.com.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/1408336696765-4061986.post%40n3.nabble.com
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DyZNSi7usuBuc88N26rXrOeZ0682VJq-6cFE281fkV2A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: accessing field data faster in script

2014-08-18 Thread avacados
Thanks Adrien for reply.


My script filter was,
===
{
 script: {
script: xyz,
params: {
   startRange: 1407939675,   // Timestamp in 
milliseconds ... keep changing on all queries
   endRange: 1410531675 // Timestamp in 
milliseconds. keep changing on all queries
},
lang: native,
_cache: true   // I removed this caching and i 
found significant performance improvement... do you know why ? :-)
 }
  },

===
My Native(Java) script code  // Return true if date ranges overlaps.

===

ScriptDocValues XsDocValue = (ScriptDocValues) doc().get(

start_time);

long XsLong = 0l;

if (XsDocValue != null  !XsDocValue.isEmpty()) {

XsLong = ((ScriptDocValues.Longs) doc().get(start_time))

.getValue();

}

ScriptDocValues XeDocValue = (ScriptDocValues) doc().get(end_time);

long XeLong = 0l;

if (XeDocValue != null  !XeDocValue.isEmpty()) {

XeLong = ((ScriptDocValues.Longs) doc().get(end_time))

.getValue();

}

if ((endRange = XsLong)  (startRange = XeLong)) {

return true;

}

===

On Monday, August 18, 2014 1:50:17 PM UTC+5:30, Adrien Grand wrote:

 Script filters are inherently slow due to the fact that they cannot 
 leverage the inverted index in order to skip efficiently over non-matching 
 documents. Even if they were written in assembly, this would likely still 
 be slow.

 What kind of filtering are you trying to do with scripts?


 On Thu, Aug 14, 2014 at 8:42 AM, avacados kotadi...@gmail.com 
 javascript: wrote:

 How to access field data faster from native (java) script ??? should i 
 enable 'doc values'?

 I am already using doc().getField() and casting to long. It is date field 
 type. But whenever, my argument to script changes, it has poor performance 
 for search query. Subsequent call with same argument has good performance. 
 (might be because _cache is true for that script filter.)

 Thanks.


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/afd89e62-0773-4684-904d-53805d9d7358%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/afd89e62-0773-4684-904d-53805d9d7358%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 -- 
 Adrien Grand
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/31232686-2208-4c9e-a0a5-53e7e33ba275%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Using a char_filter in combination with a lowercase filter

2014-08-18 Thread Matthias Hogerheijde
Hi,

We're using Elasticsearch with an Analyzer to map the `y` character to 
`ij`, (*char_fitler* named char_mapper) since in Dutch these two are 
somewhat interchangeable. We're also using a *lowercase filter*.

This is the configuration:

{
  analysis: {
analyzer: {
  index: {
type: custom,
tokenizer: standard,
filter: [
  lowercase,
  synonym_twoway,
  standard,
  asciifolding
],
char_filter: [
  char_mapper
]
  },
  index_prefix: {
type: custom,
tokenizer: standard,
filter: [
  lowercase,
  synonym_twoway,
  standard,
  asciifolding,
  prefixes
],
char_filter: [
  char_mapper
]
  },
  search: {
alias: [
  default
],
type: custom,
tokenizer: standard,
filter: [
  lowercase,
  synonym,
  synonym_twoway,
  standard,
  asciifolding
],
char_filter: [
  char_mapper
]
  },
  postal_code: {
tokenizer: keyword,
filter: [
  lowercase
]
  }
},
tokenizer: {
  standard: {
stopwords: [


]
  }
},
filter: {
  synonym: {
type: synonym,
synonyms: [
  st = sint,
  jp = jan pieterszoon,
  mh = maarten harpertszoon
]
  },
  synonym_twoway: {
type: synonym,
synonyms: [
  den haag, s gravenhage,
  den bosch, s hertogenbosch
]
  },
  prefixes: {
type: edgeNGram,
side: front,
min_gram: 1,
max_gram: 30
  }
},
char_filter: {
  char_mapper: {
type: mapping,
mappings: [
  y = ij
]
  }
}
  }
}

When indexing cities, we're using this mapping:

{
  properties: {
city: {
  type: multi_field,
  fields: {
city: {
  type: string
},
prefix: {
  type: string,
  boost: 0.5,
  index_analyzer: index_prefix
}
  }
},
province_code: {
  type: string
},
unique_name: {
  type: boolean
},
point: {
  type: geo_point
},
search_terms: {
  type: multi_field,
  fields: {
search_terms: {
  type: string
},
prefix: {
  boost: 0.5,
  index_analyzer: index_prefix,
  type: string
}
  }
}
  },
  search_analyzer: search,
  index_analyzer: index
}

When we index all the (Dutch) cities from our data-source, there are cities 
starting with both `IJ` and `Y`. (for example, these citiy names exist: 
*IJssel*, *IJsselstein*, *Yerseke* and *Ysselsteyn.*) It seems that these 
characters are not lowercased before the char_mapping is applied. 

Querying the index, results in

/top/city/_search?q=ijsselstein - works, returns the document for 
IJsselstein
/top/city/_search?q=Ijsselstein - works, returns the document for 
IJsselstein
/top/city/_search?q=yerseke - *doesn't *work, returns nothing
/top/city/_search?q=Yerseke - *does *work, returns the document for Yerseke
/top/city/_search?q=YsselsteYn - *doesn't *work, returns nothing
/top/city/_search?q=Ysselsteyn - *does *work, returns the document for 
Ysselsteyn

Changing the case of any other letter doesn't affect the results.

I've worked around this issue by adding the mapping Y = ij, i.e.:

char_filter: {
  char_mapper: {
type: mapping,
mappings: [
  y = ij,
  Y = ij
]
  }
}

This solves the problem, but I'd rather see that the lowercase filter is 
applied before the mapping, or, that I can make the order explicit. Is 
there any stance on this issue? Or is this intended behaviour?

Regards,
Matthias Hogerheijde



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c60de452-2a3f-42f7-a677-956f81ecec17%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: accessing field data faster in script

2014-08-18 Thread Adrien Grand
Your filter would be faster if you used range filters on the start/end
dates instead of using a script.

On Mon, Aug 18, 2014 at 10:52 AM, avacados kotadia.ak...@gmail.com wrote:

 _cache: true   // I removed this caching and i
 found significant performance improvement... do you know why ? :-)


Yes: when caching a filter, it needs to be evaluated over all documents of
your index in order to be loaded into a bit set. On the other hand, when a
script filter is not cached it will typically only be evaluated on
documents that match the query.

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6q0HJc_-J0i_mLBe%2BGKhkFdBEeTTabuYFGx21VToRVnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Excheption in suggester responce

2014-08-18 Thread makr
Hi!
I try test elasticsearch suggester, but i got strange error.

user@user:/user/esconfig # curl -X POST 
'localhost:9200/dwh_direct/_suggest?pretty' -d @suggester
{
  _shards : {
total : 5,
successful : 0,
failed : 5,
failures : [ {
  index : dwh_direct,
  shard : 0,
  reason : BroadcastShardOperationFailedException[[dwh_direct][0] ]; 
nested: ElasticsearchException[failed to execute suggest]; nested: 
ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper 
cannot be cast to 
org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
}, {
  index : dwh_direct,
  shard : 1,
  reason : BroadcastShardOperationFailedException[[dwh_direct][1] ]; 
nested: ElasticsearchException[failed to execute suggest]; nested: 
ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper 
cannot be cast to 
org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
}, {
  index : dwh_direct,
  shard : 2,
  reason : BroadcastShardOperationFailedException[[dwh_direct][2] ]; 
nested: ElasticsearchException[failed to execute suggest]; nested: 
ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper 
cannot be cast to 
org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
}, {
  index : dwh_direct,
  shard : 3,
  reason : BroadcastShardOperationFailedException[[dwh_direct][3] ]; 
nested: ElasticsearchException[failed to execute suggest]; nested: 
ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper 
cannot be cast to 
org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
} ]
  }
}

user@user:/user/esconfig # more suggester
{
  my-suggest : {
text : co,
completion : {
  field : name
}
  }
}

Is it bug in elasticsearch or i made mistake in configuration or query?


--

Maxim Krasovsky

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/465403c8-6f5a-4151-9c0f-e6e490fdfe13%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Enhancing perf for my cluster

2014-08-18 Thread Pierrick Boutruche
Hi everyone !

I'm currently working on a tool with *ES and Twitter Streaming API*, in 
which I try to find interesting profiles on Twitter, based on what they 
tweet, RT and which of their interactions are shared/RT.

Anyway, I use ES to index and search among tweets. To do that, I get 
Twitter stream data and put in a *single index users  tweets (2 types)*, 
linked by the user id via un parent-child relation. Actually, I thought of 
my indexing a lot and it is the best way to do it. 
- I need to update very often users (because i score them and because they 
update their profile quite often), so get the user nested in the tweet is 
not an option (too many replicas)
- I could put user's tweets directly in the user object but I would have 
huge objects and I don't really want that.

I work on a SoYouStart Server, 4c/4t 3.2GHz, 32Go RAM, 4To HDD.

My settings for the index are :

settings = {

 index : {

 number_of_replicas : 0,

 refresh_interval : '10s',

 routing.allocation.disable_allocation: False

 },

 analysis: {

 analyzer: {

 snowFrench:{

 type: snowball,

 language: French

 },

 snowEnglish:{

 type: snowball,

 language: English

 },

 snowGerman:{

 type: snowball,

 language: German

 },

 snowRussian:{

 type: snowball,

 language: Russian

 },

 snowSpanish:{

 type: snowball,

 language: Spanish

 },

 snowJapanese:{

 type: snowball,

 language: Japanese

 },

 edgeNGramAnalyzer:{

 tokenizer: myEdgeNGram

 },

 name_analyzer: {

 tokenizer: whitespace,

 type: custom,

 filter: [lowercase, multi_words, name_filter]

 },

 city_analyzer : {

 type : snowball,

 language : English

 }

 },

 tokenizer : {

 myEdgeNGram : {

 type : edgeNGram,

 min_gram : 2,

 max_gram : 5

 },

 name_tokenizer: {

 type: edgeNGram,

 max_gram: 100,

 min_gram: 4

 }

 },

 filter: {

 multi_words: {

 type: shingle,

 min_shingle_size: 2,

 max_shingle_size: 10

 },

 name_filter: {

 type: edgeNGram,

 max_gram: 100,

 min_gram: 4

 }  

 }

 }

 }


And my mappings are :

 tweet_mapping = {

 _all : {
 enabled : False
 },
 _ttl : { 
 enabled : True, 
 default : 400d 
 },
 _parent : {
 type : 'user'
 },
 properties: {
 textfr: {
 'type': 'string',
 '_analyzer': 'snowFrench',
 'copy_to': 'text'
 },
 texten: {
 'type': 'string',
 '_analyzer': 'snowEnglish',
 'copy_to': 'text'
 },
 textde: {
 'type': 'string',
 '_analyzer': 'snowGerman',
 'copy_to': 'text'
 },
 textja: {
 'type': 'string',
 '_analyzer': 'snowJapanese',
 'copy_to': 'text'
 },
 textru: {
 'type': 'string',
 '_analyzer': 'snowRussian',
 'copy_to': 'text'
 },
 textes: {
 'type': 'string',
 '_analyzer': 'snowSpanish',
 'copy_to': 'text'
 },
 text: {
 'type': 'string',
 'null_value': '',
 'index': 'analyzed',
 'store': 'yes'
 },
 entities: {
 'type': 'object',
 'index': 'analyzed',
 'store': 'yes',
 'properties': {
hashtags: {
 'index': 'analyzed',
 'store': 'yes',
 'type': 'string',
 _analyzer: edgeNGramAnalyzer
 },
 mentions: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'long',
 'precision_step': 64
 }
 }
 },  
 lang: {
 'index': 'not_analyzed',
 'store': 'yes', 
 'type': 'string'
 }, 
 created_at: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'date',
 'format' : 'dd-MM- HH:mm:ss'
 }
 }
 }
 user_mapping = {
 _all : {
 enabled : False
 },
 _ttl : { 
 enabled : True, 
 default : 600d 
 },
 properties: {
   lang: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'string'
 },
 name: {
 'index': 'analyzed',
 'store': 'yes',
 'type': 'string',
 _analyzer: edgeNGramAnalyzer
 }, 
 screen_name: {
 'index': 'analyzed',
 'store': 'yes',
 'type': 'string',
 _analyzer: edgeNGramAnalyzer
 }, 
 descfr: {
 'type': 'string',
 '_analyzer': 'snowFrench',
 'copy_to': 'description'
 },
 descen: {
 'type': 'string',
 '_analyzer': 'snowEnglish',
 'copy_to': 'description'
 },
 descde: {
 'type': 'string',
 '_analyzer': 'snowGerman',
 'copy_to': 'description'
 },
 descja: {
 'type': 'string',
 '_analyzer': 'snowJapanese',
 'copy_to': 'description'
 },
 descru: {
 'type': 'string',
 '_analyzer': 'snowRussian',
 'copy_to': 'description'
 },
 desces: {
 'type': 'string',
 '_analyzer': 'snowSpanish',
 'copy_to': 'description'
 },
 description: {
 'type': 'string',
 'null_value': '',
 'index': 'analyzed',
 'store': 'yes'
 },
 created_at: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'date',

elasticsearch php api with multiple hosts

2014-08-18 Thread Niv Penso


I followed this link to create an elasticsearch 2 nodes cluster on Azure: this 
link http://thomasardal.com/running-elasticsearch-in-a-cluster-on-azure/

the installation and configuring went good.

When i started to check the cluster i found a strange behaviour from the 
php client.

I declared 2 hosts in the client:

$ELSEARCH_SERVER = array(dns1:9200,dns2:9200);
$params = array();
$params['hosts'] = $ELSEARCH_SERVER;
$dstEl = new Elasticsearch\Client($params);

the excpected behaviour is that it will try to insert the documents to 
dns1 and if it fails it will *automatically* change to dns2. but, for 
some reason when one of the servers is down on insertion the php client 
throws an exception that it couldn't connect to host and only.

Is there any way to cause the client automatically choose an online server?

thnx, Niv

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7489eb44-bff3-41d1-baa1-da70b508ef66%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to update nest from 0.12 to 1.0

2014-08-18 Thread Dmitriy Bashkalin
Hello. Does someone use NEST for .NET?
Please help me. 
Sometime ago I asked how to get part of textfield. I wanted to do it with 
Highlight param no_match_size, but it's supported since NEST version 
1.0RC1. After update nest.dll from 0.12 to 1.0 I got problem that nothing 
works. Looking GitHub for changelog didn't help.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5952eaf6-7d31-4682-9789-bf4a720768ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: A few questions about node types + usage

2014-08-18 Thread Alex
Hello again Mark, 

Thanks for your response. Your answers really are very helpful.

As with our previous conversation 
https://groups.google.com/d/topic/elasticsearch/ZouS4NVsTJw/discussion I 
am confused about how to make a client node also be master eligible. This 
is what I posted there, I would really like some help understanding this:

I've done more investigating and it seems that a Client (AKA Query) node 
cannot also be a Master node. As it says here 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#master-election

*Nodes can be excluded from becoming a master by setting node.master to 
false. Note, once a node is a client node (node.client set to true), it 
will not be allowed to become a master (node.master is automatically set to 
false).*

And from the elasticsearch.yml config file it says:












*# 2. You want this node to only serve as a master: to not store any data 
and # to have free resources. This will be the coordinator of your 
cluster. # #node.master: true #node.data: false # # 3. You want this node 
to be neither master nor data node, but # to act as a search load 
balancer (fetching data from nodes, # aggregating results, 
etc.) # #node.master: false #node.data: false*

So I'm wondering how exactly you set up your client nodes to also be master 
nodes. It seems like a master node can only either be purely a master or 
master + data.

Perhaps you could show the relevant parts of one of your client node's 
config?

Many thanks, Alex

On Saturday, 16 August 2014 01:04:37 UTC+1, Mark Walkom wrote:

 1 - Up to you. We use the http output and then just use a round robin A 
 record to our 3 masters.
 2 - They are routed but it makes more sense to specify.
 3 - You're right, but most people only use 1 or 2 masters which is why 
 they get recommended to have at least 3.
 4 - That sounds like a lot. We use masters that double as clients and they 
 only have 8GB, our use sounds similar and we don't have issues.

 I wouldn't bother with 3 client only nodes to start, use them as master 
 and client and then if you find you are hitting memory issues due to 
 queries you can re-evaluate things.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 15 August 2014 20:11, Alex alex@gmail.com javascript: wrote:

 Bump. Any help? Thanks

 On Wednesday, 13 August 2014 12:10:14 UTC+1, Alex wrote:

 Hello I would like some clarification about node types and their usage. 

 We will have 3 client nodes and 6 data nodes. The 6 1TB data nodes can 
 also be masters (discovery.zen.minimum_master_nodes set to 4). We will 
 use Logstash and Kibana. Kibana will be used 24/7 by between a couple and 
 handfuls of people.

 Some questions:

1. Should incoming Logstash write requests be sent to the cluster in 
general (using the *cluster* setting in the *elasticsearch* output) 
or specifically to the client nodes or to the data nodes (via load 
balancer)? I am unsure what kind of node is best for handling writes.

2. If client nodes exist in the cluster are Kibana requests 
automatically routed to them? Do I need to somehow specify to Kibana 
 which 
nodes to contact?

3. I have heard different information about master nodes and the 
minimum_master_node setting. I've heard that you should have a odd 
 number 
of master nodes but I fail to see why the parity of the number of 
 masters 
matters as long as minimum_master_node is set to at least N/2 + 1. Does 
 it 
really need to be odd?

4. I have been advised that the client nodes will use huge amount of 
memory (which makes sense due to the nature of the Kibana facet 
 queries). 
64GB per client node was recommended but I have no idea if that sounds 
right or not. I don't have the ability to actually test it right now so 
 any 
more guidance on that would be helpful. 

 I'd be so grateful to hear from you even if you only know something 
 about one of my queries.

 Thank you for your time,
 Alex

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/70b16a1e-319c-4f7c-b129-b68258b3652f%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/70b16a1e-319c-4f7c-b129-b68258b3652f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web 

river-csv plugin

2014-08-18 Thread HansPeterSloot
Hi, 

This is for elasticsearch : elasticsearch-1.3.2-1.noarch
There are 2 nodes in the cluster.
I have installed the river-csv pluging.

When loading a file with 5 million rows loading stops after 477400 rows.

I load with :
curl -XPUT localhost:9200/_river/my_csv_river/_meta -d '
{
type : csv,
csv_file : {
folder : /u01/app/div,
first_line_is_header:true
}
}'

In the logfile I see :
[2014-08-18 14:44:53,216][INFO 
][org.agileworks.elasticsearch.river.csv.CSVRiver] [Stanley] 
[csv][my_csv_river] Going to execute new bulk composed of 100 actions
[2014-08-18 14:44:53,275][INFO 
][org.agileworks.elasticsearch.river.csv.CSVRiver] [Stanley] 
[csv][my_csv_river] Executed bulk composed of 100 actions
[2014-08-18 14:44:53,280][INFO 
][org.agileworks.elasticsearch.river.csv.CSVRiver] [Stanley] 
[csv][my_csv_river] Going to execute new bulk composed of 100 actions
[2014-08-18 14:44:53,299][INFO 
][org.agileworks.elasticsearch.river.csv.CSVRiver] [Stanley] 
[csv][my_csv_river] Executed bulk composed of 100 actions
[2014-08-18 14:44:53,385][INFO 
][org.agileworks.elasticsearch.river.csv.CSVRiver] [Stanley] 
[csv][my_csv_river] Executed bulk composed of 100 actions

./es -v indices
status name pri rep size bytes   docs
green  _river 1   1  15452  2
green  my_csv_river   5   1  296047073 477400

Am I doing something wrong?

Regards HansP

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/76cafcc4-9966-4c0b-b891-b18b9376a74f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Excheption in suggester responce

2014-08-18 Thread vineeth mohan
Hello Maxim ,

Can you show the schema and a sample data that you have indexed.

Thanks
  Vineeth


On Mon, Aug 18, 2014 at 3:31 PM, m...@ciklum.com wrote:

 Hi!
 I try test elasticsearch suggester, but i got strange error.

 user@user:/user/esconfig # curl -X POST
 'localhost:9200/dwh_direct/_suggest?pretty' -d @suggester
 {
   _shards : {
 total : 5,
 successful : 0,
 failed : 5,
 failures : [ {
   index : dwh_direct,
   shard : 0,
   reason : BroadcastShardOperationFailedException[[dwh_direct][0]
 ]; nested: ElasticsearchException[failed to execute suggest]; nested:
 ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper
 cannot be cast to
 org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
 }, {
   index : dwh_direct,
   shard : 1,
   reason : BroadcastShardOperationFailedException[[dwh_direct][1]
 ]; nested: ElasticsearchException[failed to execute suggest]; nested:
 ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper
 cannot be cast to
 org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
 }, {
   index : dwh_direct,
   shard : 2,
   reason : BroadcastShardOperationFailedException[[dwh_direct][2]
 ]; nested: ElasticsearchException[failed to execute suggest]; nested:
 ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper
 cannot be cast to
 org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
 }, {
   index : dwh_direct,
   shard : 3,
   reason : BroadcastShardOperationFailedException[[dwh_direct][3]
 ]; nested: ElasticsearchException[failed to execute suggest]; nested:
 ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper
 cannot be cast to
 org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
 } ]
   }
 }

 user@user:/user/esconfig # more suggester
 {
   my-suggest : {
 text : co,
 completion : {
   field : name
 }
   }
 }

 Is it bug in elasticsearch or i made mistake in configuration or query?


 --

 Maxim Krasovsky

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/465403c8-6f5a-4151-9c0f-e6e490fdfe13%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/465403c8-6f5a-4151-9c0f-e6e490fdfe13%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMQmX788p%3Db0G%2Bqk_z6wwsA9HBtLu66fLRoKxvupRo%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: EsRejectedExecutionException: rejected execution (queue capacity 1000)

2014-08-18 Thread Sávio S . Teles de Oliveira
You can put *threadpool.search.type: **cached* on elasticsearch.yml for
unbounded queue for reads.


2014-08-10 9:52 GMT-03:00 James digital...@gmail.com:

  On Sat, 2014-08-09 at 23:53 -0700, Deep wrote:

 Hi,



  Elastic search internally has thread pool and a queue size is associated
 with each pool. You can have pools for search threads, index threads etc.
 Please see the elastic search documentation in link
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html
 . I think it is possible to override these properties in the
 elasticsearch.yml configuration file.



  Regards,

  Ishwardeep

 On Saturday, 9 August 2014 00:54:02 UTC+5:30, digit...@gmail.com wrote:

 So I've seen a few posts on this, but I've not seen any solutions posted.
 I've been log monitoring and I was trying to determine how to fix the
 below...any information would be great thank you.

 [2014-08-08 19:14:12,578][DEBUG][action.search.type   ] [Jericho
 Drumm] [bro-201408032100][2], node[fgjxNK0cQ3O5Usn7wyjaMA], [P],
 s[STARTED]: Failed to execute
 [org.elasticsearch.action.search.SearchRequest@126067b7] lastShard [true]
 org.elasticsearch.common.util.concurrent.EsRejectedExecutionException:
 rejected execution (queue capacity 1000) on
 org.elasticsearch.search.action.SearchServiceTransportAction$23@5a879352
 at
 org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62)
 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
 at
 org.elasticsearch.search.action.SearchServiceTransportAction.execute(SearchServiceTransportAction.java:509)
 at
 org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:203)
 at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
 at
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:171)
 at
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:153)
 at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59)
 at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49)
 at
 org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
 at
 org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:101)
 at
 org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
 at
 org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
 at
 org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92)
 at
 org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212)
 at
 org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:75)
 at
 org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
 at
 org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
 at
 org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
 at
 org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
 at
 org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:294)
 at
 org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:44)
 at
 org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at
 org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
 at
 org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at
 org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
 at
 

Re: Query problem

2014-08-18 Thread Luc Evers
 David hi,

  How can I configure the mapping so that the default analyzer will be the
whitespace one?







On Wed, Aug 13, 2014 at 2:46 PM, David Pilato da...@pilato.fr wrote:

 Having no answer is not good. I think something goes wrong here. May be
 you should see something in logs.

 That said, if you don't want to break your string as tokens at index time,
 you could set index:not_analyzed for fields you don't want to analyze.

 But, you should read this part of the book:
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/analysis-intro.html#analysis-intro

 --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr
 https://twitter.com/elasticsearchfr


 Le 13 août 2014 à 14:39:20, Luc Evers (lucev...@gmail.com) a écrit:


   I like to use elasticsearch as Nosql database + search engine for data
 coming from text files (router configs) and databases .
   First I moved a routerconfig to a json file which I indexed .

   Mapping:

  {
   configs : {
 mappings : {
   test : {
 properties : {
   ConfLength : {
 type : string
   },
   NVRAM : {
 type : string
   },
   aaa : {
 type : string
   },
   enable : {
 type : string
   },
   hostname : {
 type : string
   },
   lastChange : {
 type : string
   },
   logging : {
 type : string
   },
   model : {
 type : string
   },
   policy-map : {
 type : string
   }
 }
   }
 }
   }
 }


 Document:

  {
 _index : configs,
 _type : test,
 _id : 7,
 _score : 1,
 _source : {
 hostname : [
 hostname test-1234
  ]
  }
  },


 Example of a simple search:  search a hostname.

 *If I start a query*:

  *curl -XGET 'http://127.0.0.1:9200/configs/_search?q=
 http://127.0.0.1:9200/configs/_search?q=hostname test-1234'*
 curl: (52) Empty reply from server

 No respone

 If I start a second query without hostname if got an answer:

  *curl -XGET 'http://127.0.0.1:9200/configs/_search?q=
 http://127.0.0.1:9200/configs/_search?q=test-1234'*
 OKE

 Analyser: standard

 Why a search instruction can find test-1234 but not hostname test-1234
 ?













  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.

 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/b25127bb-2dca-440c-a7b3-937b5ddccd6d%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/b25127bb-2dca-440c-a7b3-937b5ddccd6d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/xOrC6RMG_nw/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/etPan.53eb5e1b.3804823e.18f0%40MacBook-Air-de-David.local
 https://groups.google.com/d/msgid/elasticsearch/etPan.53eb5e1b.3804823e.18f0%40MacBook-Air-de-David.local?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAA0yNqLNnQK0%2BtJgRkOqwgJawqngMjmWJfXDgijpcuEbQYbyZw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


ThreadPool reject_policy

2014-08-18 Thread Sávio S . Teles de Oliveira
What does it work the threadpool using reject_policy *caller*?

Can I catch the exception EsRejectedExecutionException (using Java api)
during heavy writes?

-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
CUIA Internet Brasil

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFKmhPtnm9xV21nhvtE%3D0hv4GoLXhugNpkJXqC9Mec93892USg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query problem

2014-08-18 Thread David Pilato
I think could help you: 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 18 août 2014 à 15:39:36, Luc Evers (lucev...@gmail.com) a écrit:

 David hi,

  How can I configure the mapping so that the default analyzer will be the 
whitespace one?
  
 





On Wed, Aug 13, 2014 at 2:46 PM, David Pilato da...@pilato.fr wrote:
Having no answer is not good. I think something goes wrong here. May be you 
should see something in logs.

That said, if you don't want to break your string as tokens at index time, you 
could set index:not_analyzed for fields you don't want to analyze.

But, you should read this part of the book: 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/analysis-intro.html#analysis-intro

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 13 août 2014 à 14:39:20, Luc Evers (lucev...@gmail.com) a écrit:

     
  I like to use elasticsearch as Nosql database + search engine for data coming 
from text files (router configs) and databases .
  First I moved a routerconfig to a json file which I indexed .

  Mapping:

{
  configs : {
    mappings : {
      test : {
        properties : {
          ConfLength : {
            type : string
          },
          NVRAM : {
            type : string
          },
          aaa : {
            type : string
          },
          enable : {
            type : string
          },
          hostname : {
            type : string
          },
          lastChange : {
            type : string
          },
          logging : {
            type : string
          },
          model : {
            type : string
          },
          policy-map : {
            type : string
          }
        }
      }
    }
  }
}


Document:

{
_index : configs,
_type : test,
_id : 7,
_score : 1,
_source : {
hostname : [
hostname test-1234
]
}
},


Example of a simple search:  search a hostname.

If I start a query:   

curl -XGET 'http://127.0.0.1:9200/configs/_search?q=hostname test-1234'
curl: (52) Empty reply from server

No respone

If I start a second query without hostname if got an answer:

curl -XGET 'http://127.0.0.1:9200/configs/_search?q=test-1234;'
OKE

Analyser: standard

Why a search instruction can find test-1234 but not hostname test-1234 ? 













--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.

To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b25127bb-2dca-440c-a7b3-937b5ddccd6d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google 
Groups elasticsearch group.
To unsubscribe from this topic, visit 
https://groups.google.com/d/topic/elasticsearch/xOrC6RMG_nw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53eb5e1b.3804823e.18f0%40MacBook-Air-de-David.local.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAA0yNqLNnQK0%2BtJgRkOqwgJawqngMjmWJfXDgijpcuEbQYbyZw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53f202c7.2eb141f2.132%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


How to normalize score when combining regular query and function_score?

2014-08-18 Thread JohnnyM
First of all kudos on the awesome job everyone here is doing!

I was wondering if you guys can help me solve this puzzle:

Also available on stack 
overflow: 
http://stackoverflow.com/questions/25361795/elasticsearch-how-to-normalize-score-when-combining-regular-query-and-function

Idealy what I am trying to achieve is to assign weights to queries such 
that query1 constitutes 30% of the final score and query2 consitutes other 
70%, so to achieve the maximum score a document has to have highest 
possible score on query1 and query2. My study of the documentation did not 
yield any hints as to how to achieve this so lets try to solve a simpler 
problem.

Consider a query in following form:

{
query: {
bool: {
should: [
{
function_score: {
query: {match_all: {}},
script_score: {
script: some_script,
}
}
},
{
match: {
message: this is a test
}
}
]
}
}
}

The script can return an arbitrary number (think- it can return something 
like 12392002).

How do I make sure that the result from the script will not dominate the 
overall score? (my experiments using explain show that this indeed can 
happen very often)

Is there any way to normalize it? For example instead of script score 
return the ratio to max_script_score (achieved by document with highest 
score)?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2179ed93-575c-47d5-a13a-42d1e2244baa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


help with a grok filter

2014-08-18 Thread Kevin M
Could someone help me write a grok filter for this log real quick here is 
what the log looks like:


Aug 18 09:40:39 server01 webmin_log: 172.16.16.96 - username 
*[18/Aug/2014:09:40:39 
-0400]* GET /right.cgi?open=systemopen=status HTTP/1.1 200 3228

here is what I have so far:

match = [ message, %{SYSLOGTIMESTAMP:timestamp} %{WORD:Server} 
webmin_log: %{IP:IP_Address} - %{USERNAME:username} *[ stuck at this middle 
part [18/Aug/2014:09:40:39 -0400] *] %{WORD:method} 
%{URIPATHPARAM:request} HTTP/1.1 %{NUMBER:bytes} %{NUMBER:duration}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4784c4b4-65ab-4894-8a1b-a8ab0fba0ed6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Help with the percentiles aggregation

2014-08-18 Thread John Ogden
That's spot on. Thanks!
On 18 Aug 2014 09:08, Adrien Grand adrien.gr...@elasticsearch.com wrote:

 Hi John,

 You should be able to do something like:

 {
   aggs: {
 verb: {
   terms: {
 field: verb
   },
   aggs: {
 load_time_outliers: {
   percentiles: {
 field: responsetime
   }
 }
   }
 }
   }
 }

 This will first break down your documents according to the http verb that
 is being used and then compute percentiles separately for each unique verb.



 On Fri, Aug 15, 2014 at 11:23 AM, John Ogden johnog65...@gmail.com
 wrote:

 Hi,

 Am trying to run a single command which calculates percentiles for
 multiple search queries.
 The data for this is an Apache log file, and I want to get the percentile
 response times for the gets, posts, heads (etc) in one go

 If I run this:
 curl -XPOST 'http://localhost:9200/_search?search_type=countpretty=true'
 -d '{
 facets: {
 0: {query : {term : { verb : get  }}},
 1: {query : {term : { verb : post }}}
 },
 aggs : {load_time_outlier : {percentiles : {field :
 responsetime}}}
 }'

 The response I get back has the counts for each subquery but only does
 the aggregations for the overall dataset
   facets : {
 0 : {
   _type : query,
   count : 5678
 },
 1 : {
   _type : query,
   count : 1234
 }
   },
   aggregations : {
 load_time_outlier : {
   values : {
 1.0 : 0.0,
  ...
 99.0 : 1234
   }
 }
   }

 I cant figure out how to structure the request so that I get the
 percentiles separately for each of the queries

 Could someone point me in in the right direction please

 Many thanks
 John

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/9d9696cb-adfa-4812-bd81-5efee0d29032%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/9d9696cb-adfa-4812-bd81-5efee0d29032%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 --
 Adrien Grand

 --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/6tHMOeWYtoo/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5JwTLK2q10fEKX6bVBzYH69dSRgA2njoEvhhronqfh1A%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5JwTLK2q10fEKX6bVBzYH69dSRgA2njoEvhhronqfh1A%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGfq%3DRjVu58Jetkgf%3DGvJ4BkLjhWYPvm789UGPrr0U%2BOiA_Wxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


ES ignores index.query.bool.max_clause_count in elasticsearch.yml

2014-08-18 Thread l . daedelow
It seems to me that ES ignores the index.query.bool.max_clause_count 
argument in elasticsearch.yml

Setting index.query.bool.max_clause_count: 5000 results in the following 
error:

Caused by: org.apache.lucene.search.BooleanQuery$TooManyClauses: 
maxClauseCount is set to 1024

Any solution whats going wrong here?
 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ddedd43b-9f1c-4373-9280-671d7cc828a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Help with the percentiles aggregation

2014-08-18 Thread John Ogden
Slight follow on - do you know if returning this sort of stuff via Kibana 
is on the cards?
Just looking for an easy way to graph the results.

Thanks.





On Friday, 15 August 2014 10:23:16 UTC+1, John Ogden wrote:

 Hi,

 Am trying to run a single command which calculates percentiles for 
 multiple search queries.
 The data for this is an Apache log file, and I want to get the percentile 
 response times for the gets, posts, heads (etc) in one go

 If I run this:
 curl -XPOST 'http://localhost:9200/_search?search_type=countpretty=true' 
 -d '{
 facets: { 
 0: {query : {term : { verb : get  }}},
 1: {query : {term : { verb : post }}}
 },
 aggs : {load_time_outlier : {percentiles : {field : 
 responsetime}}}   
 }'

 The response I get back has the counts for each subquery but only does the 
 aggregations for the overall dataset 
   facets : {
 0 : {
   _type : query,
   count : 5678
 },
 1 : {
   _type : query,
   count : 1234
 }
   },
   aggregations : {
 load_time_outlier : {
   values : {
 1.0 : 0.0,
  ...
 99.0 : 1234
   }
 }
   }

 I cant figure out how to structure the request so that I get the 
 percentiles separately for each of the queries

 Could someone point me in in the right direction please

 Many thanks
 John


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/579dad15-4470-4f0d-a787-9b51fd7b447a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Help with multiple data ranges in a single query

2014-08-18 Thread John Ogden
I've been given a requirement to produce a single kibana dashboard showing 
app response times for multiple date ranges, and am stumped at how to 
proceed.
The user wants to see today's graph, along with the previous working day, 
day -7, day -28 and day -364 on the same screen - ideally, all 4 metrics in 
the same histogram  if they select another date range they want that to 
show the day-1, day-7 (etc) results too

The only thing I've been able to come up with so far is pushing each source 
event into elastic search  4 times (once with right timestamp,one with +1 
day, one with +7 days, one with +28 days, etc.) and writing separate 
queries for each, but this just feels wrong.

Any ideas how else the requirement could be met?


Many thanks.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3525d473-4172-45b6-852f-a0e4826eca3b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Help with the percentiles aggregation

2014-08-18 Thread Adrien Grand
Support for aggregations is indeed something that is on the roadmap for the
next version of Kibana (Kibana 4), see this message from Rashid:
https://groups.google.com/forum/?utm_medium=emailutm_source=footer#!msg/elasticsearch/I7um1mX4GSk/aUsT2EmyxysJ


On Mon, Aug 18, 2014 at 4:33 PM, John Ogden johnog65...@gmail.com wrote:

 Slight follow on - do you know if returning this sort of stuff via Kibana
 is on the cards?
 Just looking for an easy way to graph the results.

 Thanks.






 On Friday, 15 August 2014 10:23:16 UTC+1, John Ogden wrote:

 Hi,

 Am trying to run a single command which calculates percentiles for
 multiple search queries.
 The data for this is an Apache log file, and I want to get the percentile
 response times for the gets, posts, heads (etc) in one go

 If I run this:
 curl -XPOST 'http://localhost:9200/_search?search_type=countpretty=true'
 -d '{
 facets: {
 0: {query : {term : { verb : get  }}},
 1: {query : {term : { verb : post }}}
 },
 aggs : {load_time_outlier : {percentiles : {field :
 responsetime}}}
 }'

 The response I get back has the counts for each subquery but only does
 the aggregations for the overall dataset
   facets : {
 0 : {
   _type : query,
   count : 5678
 },
 1 : {
   _type : query,
   count : 1234
 }
   },
   aggregations : {
 load_time_outlier : {
   values : {
 1.0 : 0.0,
  ...
 99.0 : 1234
   }
 }
   }

 I cant figure out how to structure the request so that I get the
 percentiles separately for each of the queries

 Could someone point me in in the right direction please

 Many thanks
 John

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/579dad15-4470-4f0d-a787-9b51fd7b447a%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/579dad15-4470-4f0d-a787-9b51fd7b447a%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5pu8of4T06R8nVv1%3DvBy3wrX5Oqqowwhiiiqv5jhyK0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Aggregates - include source data

2014-08-18 Thread John D. Ament
Hi,

From looking at the docs, didn't seem overly clear.  Is it possible to 
include the data in an aggregate, or is it counts only?

John

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/83032bdc-53f4-4728-b109-e9ab3eb3d412%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregates - include source data

2014-08-18 Thread Adrien Grand
Aggregations only report counts or various metrics (see the metrics
aggregations: stats, min, max, sum, percentiles, cardinality, top_hits,
...). Maybe top_hits is what you are looking for?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html


On Mon, Aug 18, 2014 at 5:34 PM, John D. Ament john.d.am...@gmail.com
wrote:

 Hi,

 From looking at the docs, didn't seem overly clear.  Is it possible to
 include the data in an aggregate, or is it counts only?

 John

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/83032bdc-53f4-4728-b109-e9ab3eb3d412%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/83032bdc-53f4-4728-b109-e9ab3eb3d412%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7Kct0y6LfDFENvvfgnN7N04vOpp35zRpg%3DG4AHw94Jhg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Optimization Questions

2014-08-18 Thread Andrew Selden
Hi Greg,

I believe max_num_segments is technically a hint that can be overridden by the 
merge algorithm if it decides to. You might try simply re-running the optimize 
again to get from ~25 down closer to 1. Sorry but I don't know of any way to 
see when the optimize is finished - it's really just forcing a merge so looking 
at merge stats is what you want.

Hope that helps.
Andrew


On Aug 15, 2014, at 8:01 PM, Gregory Sutcliffe gsutcli...@publishthis.com 
wrote:

 Hey Guys, 
 We were doing some updates to our es(1.3.1) clusters recently and had some 
 questions about _optimize.  We optimized with max_num_segments 1 and we're 
 still seeing ~25 segments per shard.  The index that was optimized had no 
 writes going to it during the time, it was actually freshly re-opened after 
 an upgrade.  Also, are there any tricks to seeing when an optimize is done 
 other that watching merges stats and disk IO?  Maybe some data in marvel? 
 
 Thanks for your assistance, 
 Greg
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/17622c72-f004-4fda-92fb-dda393a64807%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/88238DE4-AFC4-41D0-B495-ED9938D7CB9C%40elasticsearch.com.
For more options, visit https://groups.google.com/d/optout.


Re: Enhancing perf for my cluster

2014-08-18 Thread Pierrick Boutruche
Hey guys,

Finally i changed all my queries to constantscorequeries. It's way better, 
but still, certain pages take a lot of time running... I don't understand 
why, and i don't have anything in my ES logs... 

Now the average time for search 20 users and their mentions/timeline + 
scoring them is about 4s (and almost 4s for the search). 
But when it takes time, it's still 60s for 1 page !!! 

I tried reading the explain data i can't get after the query but there's no 
response time. How can I find a way to understand why certain queries take 
so much time ?

Thanks !

Le lundi 18 août 2014 12:29:10 UTC+2, Pierrick Boutruche a écrit :

 Hi everyone !

 I'm currently working on a tool with *ES and Twitter Streaming API*, in 
 which I try to find interesting profiles on Twitter, based on what they 
 tweet, RT and which of their interactions are shared/RT.

 Anyway, I use ES to index and search among tweets. To do that, I get 
 Twitter stream data and put in a *single index users  tweets (2 types)*, 
 linked by the user id via un parent-child relation. Actually, I thought of 
 my indexing a lot and it is the best way to do it. 
 - I need to update very often users (because i score them and because they 
 update their profile quite often), so get the user nested in the tweet is 
 not an option (too many replicas)
 - I could put user's tweets directly in the user object but I would have 
 huge objects and I don't really want that.

 I work on a SoYouStart Server, 4c/4t 3.2GHz, 32Go RAM, 4To HDD.

 My settings for the index are :

 settings = {

 index : {

 number_of_replicas : 0,

 refresh_interval : '10s',

 routing.allocation.disable_allocation: False

 },

 analysis: {

 analyzer: {

 snowFrench:{

 type: snowball,

 language: French

 },

 snowEnglish:{

 type: snowball,

 language: English

 },

 snowGerman:{

 type: snowball,

 language: German

 },

 snowRussian:{

 type: snowball,

 language: Russian

 },

 snowSpanish:{

 type: snowball,

 language: Spanish

 },

 snowJapanese:{

 type: snowball,

 language: Japanese

 },

 edgeNGramAnalyzer:{

 tokenizer: myEdgeNGram

 },

 name_analyzer: {

 tokenizer: whitespace,

 type: custom,

 filter: [lowercase, multi_words, name_filter]

 },

 city_analyzer : {

 type : snowball,

 language : English

 }

 },

 tokenizer : {

 myEdgeNGram : {

 type : edgeNGram,

 min_gram : 2,

 max_gram : 5

 },

 name_tokenizer: {

 type: edgeNGram,

 max_gram: 100,

 min_gram: 4

 }

 },

 filter: {

 multi_words: {

 type: shingle,

 min_shingle_size: 2,

 max_shingle_size: 10

 },

 name_filter: {

 type: edgeNGram,

 max_gram: 100,

 min_gram: 4

 }  

 }

 }

 }


 And my mappings are :

 tweet_mapping = {

 _all : {
 enabled : False
 },
 _ttl : { 
 enabled : True, 
 default : 400d 
 },
 _parent : {
 type : 'user'
 },
 properties: {
 textfr: {
 'type': 'string',
 '_analyzer': 'snowFrench',
 'copy_to': 'text'
 },
 texten: {
 'type': 'string',
 '_analyzer': 'snowEnglish',
 'copy_to': 'text'
 },
 textde: {
 'type': 'string',
 '_analyzer': 'snowGerman',
 'copy_to': 'text'
 },
 textja: {
 'type': 'string',
 '_analyzer': 'snowJapanese',
 'copy_to': 'text'
 },
 textru: {
 'type': 'string',
 '_analyzer': 'snowRussian',
 'copy_to': 'text'
 },
 textes: {
 'type': 'string',
 '_analyzer': 'snowSpanish',
 'copy_to': 'text'
 },
 text: {
 'type': 'string',
 'null_value': '',
 'index': 'analyzed',
 'store': 'yes'
 },
 entities: {
 'type': 'object',
 'index': 'analyzed',
 'store': 'yes',
 'properties': {
hashtags: {
 'index': 'analyzed',
 'store': 'yes',
 'type': 'string',
 _analyzer: edgeNGramAnalyzer
 },
 mentions: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'long',
 'precision_step': 64
 }
 }
 },  
 lang: {
 'index': 'not_analyzed',
 'store': 'yes', 
 'type': 'string'
 }, 
 created_at: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'date',
 'format' : 'dd-MM- HH:mm:ss'
 }
 }
 }
 user_mapping = {
 _all : {
 enabled : False
 },
 _ttl : { 
 enabled : True, 
 default : 600d 
 },
 properties: {
   lang: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'string'
 },
 name: {
 'index': 'analyzed',
 'store': 'yes',
 'type': 'string',
 _analyzer: edgeNGramAnalyzer
 }, 
 screen_name: {
 'index': 'analyzed',
 'store': 'yes',
 'type': 'string',
 _analyzer: edgeNGramAnalyzer
 }, 
 descfr: {
 'type': 'string',
 '_analyzer': 

indexing problem when using logstash

2014-08-18 Thread vitaly . bulgakov
I am using the foollowing config file
filter{
grok{
match=[
message,

(?:\?|\)C\=%{DATA:kw}\%{DATA}\sT\s%{DATA:town}\sS\s%{WORD:state}\s%{DATA}%{IP:ip}
]
}
grok{
match=[
message,
(?:\?|\)SRC\=%{DATA:src}(?:\|$)
]
}
}
output {
  elasticsearch {
host = localhost
  }
  stdout { codec = rubydebug }
}
And I thought kw, town, state, etc. will be fields in elastic search. 
But trying 
 
http://localhost:9200/_search?q=town:* AND state:*
I am getting

{took:5,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{*total:0*,max_score:null,hits:[]}}

 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b99b5f5a-9063-4970-8da2-106efc5de196%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: help with a grok filter

2014-08-18 Thread vitaly

On Monday, August 18, 2014 9:57:41 AM UTC-4, Kevin M wrote:

 Could someone help me write a grok filter for this log real quick here is 
 what the log looks like:


 Aug 18 09:40:39 server01 webmin_log: 172.16.16.96 - username 
 *[18/Aug/2014:09:40:39 
 -0400]* GET /right.cgi?open=systemopen=status HTTP/1.1 200 3228

 here is what I have so far:

 match = [ message, %{SYSLOGTIMESTAMP:timestamp} %{WORD:Server} 
 webmin_log: %{IP:IP_Address} - %{USERNAME:username} *[ stuck at this 
 middle part [18/Aug/2014:09:40:39 -0400] *] %{WORD:method} 
 %{URIPATHPARAM:request} HTTP/1.1 %{NUMBER:bytes} %{NUMBER:duration}

 
It is just a sequence of regular expressions catching fields one by one. 
Look, e.g at my post.   

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fc1251d5-d346-475d-9d21-bf993b45062e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: help with a grok filter

2014-08-18 Thread Kevin M
I dont see your post - what I am stuck with is whenever the date changes on 
that log example:


*[18/Aug/2014:09:40:39 -0400]*

*[20/Aug/2014:11:40:39 -0104]*
*[19/Aug/2014:08:40:39 -0500]*

the filter will not match it


On Monday, August 18, 2014 1:53:37 PM UTC-4, vitaly wrote:


 On Monday, August 18, 2014 9:57:41 AM UTC-4, Kevin M wrote:

 Could someone help me write a grok filter for this log real quick here is 
 what the log looks like:


 Aug 18 09:40:39 server01 webmin_log: 172.16.16.96 - username 
 *[18/Aug/2014:09:40:39 
 -0400]* GET /right.cgi?open=systemopen=status HTTP/1.1 200 3228

 here is what I have so far:

 match = [ message, %{SYSLOGTIMESTAMP:timestamp} %{WORD:Server} 
 webmin_log: %{IP:IP_Address} - %{USERNAME:username} *[ stuck at this 
 middle part [18/Aug/2014:09:40:39 -0400] *] %{WORD:method} 
 %{URIPATHPARAM:request} HTTP/1.1 %{NUMBER:bytes} %{NUMBER:duration}

  
 It is just a sequence of regular expressions catching fields one by one. 
 Look, e.g at my post.   


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b2e3db4a-385d-4bb0-aa2c-0b5b7f96b728%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[ANN] Experimental Highlighter 0.0.11 Released

2014-08-18 Thread Nikolas Everett
I released version 0.0.11 of the Experimental Highlighter
https://github.com/wikimedia/search-highlighter we've been using .  Its
compatible with Elasticsearch 1.3.x and has a few new features:
1.  Conditional highlighting - skip highlighting fields you aren't going to
use!  Save time and IO bandwidth!
1.  Regular expressions - now you have two problems!

Read more at the link above if you are interested.

Its in use on our beta site
http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page so you can try it and
verify that it doesn't crash and stuff.

Cheers,


Nik

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd00Lq%2B-k-STjnxC%2B-Lbx6jtP0zT9ShhmKcCmQFdN1ZdcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[ANN] Elasticsearch Mapper Attachment plugin 2.2.1 released

2014-08-18 Thread David Pilato
Heya,

We are pleased to announce the release of the Elasticsearch Mapper Attachment 
plugin, version 2.2.1

The mapper attachments plugin adds the attachment type to Elasticsearch using 
Apache Tika..
Release Notes - Version 2.2.1

Earlier today there was an Apache POI release to address a security 
vulnerability.  For some document types, the attachment mapper plugin will 
indirectly use POI. This attachment mapper plugin release forces an update to 
Apache POI and is a response to the POI issue. Previously, the attachment 
mapper did not have an explicit dependency on POI. With this release, we have 
added a direct dependency and set it to the recent versions of POI. This will 
help users of the attachment mapper, who might be unaware of these 
vulnerabilities, avoid them.

You can read more about the reported issues in CVE-2014-3529 and CVE-2014-3574  
 

We encourage anyone using the attachment mapper plugin with untrusted documents 
to update the plugin. 
Update

[80] - Update a few dependencies
Doc

[79] - Docs: make the welcome page more obvious
Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-mapper-attachments project repository!

For questions or comments around this plugin, feel free to use elasticsearch 
mailing list!

Enjoy,

- The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53f25b3e.5bd062c2.132%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


[ANN] Elasticsearch Mapper Attachment plugin 2.3.1 released

2014-08-18 Thread David Pilato
Heya,
We are pleased to announce the release of the Elasticsearch Mapper Attachment 
plugin, version 2.3.1

The mapper attachments plugin adds the attachment type to Elasticsearch using 
Apache Tika..
Release Notes - Version 2.3.1

Earlier today there was an Apache POI release to address a security 
vulnerability.  For some document types, the attachment mapper plugin will 
indirectly use POI. This attachment mapper plugin release forces an update to 
Apache POI and is a response to the POI issue. Previously, the attachment 
mapper did not have an explicit dependency on POI. With this release, we have 
added a direct dependency and set it to the recent versions of POI. This will 
help users of the attachment mapper, who might be unaware of these 
vulnerabilities, avoid them.

You can read more about the reported issues in CVE-2014-3529 and CVE-2014-3574  
 

We encourage anyone using the attachment mapper plugin with untrusted documents 
to update the plugin. 
Update

[80] - Update a few dependencies
Doc

[79] - Docs: make the welcome page more obvious
Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-mapper-attachments project repository!

For questions or comments around this plugin, feel free to use elasticsearch 
mailing list!

Enjoy,

- The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53f25b7d.1190cde7.132%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Unassigned Node and shards

2014-08-18 Thread IronMan2014
I saw this problem twice now.
I start with a Green two-node cluster, default 5 shards/node, I index about 
50,000 docs, shards/replicas look great and well balanced across the 2 
nodes.

I try the same test with 8 million docs. I come back when its done, and I 
see all primary shards on node1 and 2 replicas on node2 and three 
unassigned replicas on a third unassigned node.

I will look through the logs, but I was wondering if anyone has seen 
something similar or has any idea where/why this is coming from before I 
dig?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2bddca5e-ea3d-4804-9a0d-47d98bab96c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[ANN] swift-repository-plugin v0.5 released

2014-08-18 Thread Chad Horohoe
Hi all,

Just released to Central the v0.5 of the swift-repository plugin.
Mainly contains documentation updates but also built against
1.3.2 instead of 1.1.0.

https://github.com/wikimedia/search-repository-swift

-Chad

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0bc2613f-f97b-4dea-af93-d2bc9bb8521c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to increase memory

2014-08-18 Thread joergpra...@gmail.com
What version of ES do you use?

Jörg


On Mon, Aug 18, 2014 at 9:42 PM, rookie7799 pavelbara...@gmail.com wrote:

 Hello there,

 We are having the same exact problem with a really resource hungry query:
 5 nodes with 16GB ES_HEAP_SIZE
 1.2 Billion records inside 1 index with 5 shards

 Whenever we start running an aggregate query the whole cluster breaks and
 disconnects. Why can't it just not return results and simple give and error
 without actually killing the entire cluster?

 Cheers!


 On Saturday, February 9, 2013 1:05:54 PM UTC-5, Igor Motov wrote:

 ES_HEAP_SIZE ES_MAX_MEM ES_MIN_MEM are environment variables. They need
 to be specified on the command line. For example:

 ES_HEAP_SIZE=4g bin/elasticsearch -f

 To get JVM stats, you need to set jvm=true on stats request:

 curl -XGET 'http://localhost:9200/_cluster/nodes/stats?jvm=true;
 pretty=true'

 To understand how much memory you need, give it as much as you can, put
 some load and monitor jvm.mem.heap_used in the output of the stats
 command above. If this number ever goes and stays above 90%
 of available heap it's typically a good indicator that you need more.

 There is a small Russian elasticsearch forum - https://groups.google.com/
 forum/?fromgroups=#!forum/elasticsearch-ru

 On Saturday, February 9, 2013 12:57:04 PM UTC-5, Николай Измайлов wrote:

 In continuation of the topic https://github.com/
 elasticsearch/elasticsearch/issues/2636#issuecomment-13332877

 in continuation of the topic https://github.com/
 elasticsearch/elasticsearch/issues/2636#issuecomment-13332877
 On the page http://www.elasticsearch.org/guide/reference/setup/
 installation.html it is said that it is necessary to increase
 ES_HEAP_SIZE ES_MAX_MEM ES_MIN_MEM, but I have not found this configuration
 then /etc/elasticsearch/elasticsearch.yml. Here's my cluster

 {
   cluster_name : elasticsearch,
   nodes : {
 VPjABUm-REmy24NQ_AkXDQ : {
   timestamp : 1360432148849,
   name : Sin,
   transport_address : inet[/ip:9300],
   hostname : Ubuntu-1204-precise-64-minimal,
   indices : {
 store : {
   size : 34.6gb,
   size_in_bytes : 37221752556,
   throttle_time : 0s,
   throttle_time_in_millis : 0
 },
 docs : {
   count : 58480,
   deleted : 4759
 },
 indexing : {
   index_total : 20,
   index_time : 1.7s,
   index_time_in_millis : 1748,
   index_current : 0,
   delete_total : 0,
   delete_time : 0s,
   delete_time_in_millis : 0,
   delete_current : 0
 },
 get : {
   total : 2,
   time : 5ms,
   time_in_millis : 5,
   exists_total : 0,
   exists_time : 0s,
   exists_time_in_millis : 0,
   missing_total : 2,
   missing_time : 5ms,
   missing_time_in_millis : 5,
   current : 0
 },
 search : {
   query_total : 1726375,
   query_time : 7.7m,
   query_time_in_millis : 462631,
   query_current : 0,
   fetch_total : 61663,
   fetch_time : 20.9s,
   fetch_time_in_millis : 20955,
   fetch_current : 0
 },
 cache : {
   field_evictions : 0,
   field_size : 0b,
   field_size_in_bytes : 0,
   filter_count : 5896,
   filter_evictions : 0,
   filter_size : 511.6kb,
   filter_size_in_bytes : 523944,
   bloom_size : 22.1kb,
   bloom_size_in_bytes : 22640,
   id_cache_size : 0b,
   id_cache_size_in_bytes : 0
 },
 merges : {
   current : 0,
   current_docs : 0,
   current_size : 0b,
   current_size_in_bytes : 0,
   total : 0,
   total_time : 0s,
   total_time_in_millis : 0,
   total_docs : 0,
   total_size : 0b,
   total_size_in_bytes : 0
 },
 refresh : {
   total : 15,
   total_time : 143ms,
   total_time_in_millis : 143
 },
 flush : {
   total : 25,
   total_time : 3.2s,
   total_time_in_millis : 3205
 }
   }
 }
   }
 }


 As understand how much I need to allocate memory for elasticsearch and
 in General the description for each of the parameters.

 there is a Russian community ?

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/12846c2e-e777-4812-bce2-d6c97a30c352%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/12846c2e-e777-4812-bce2-d6c97a30c352%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit 

Top hits aggregation default sort

2014-08-18 Thread Dan Tuffery
I using the top hits aggregation with a has_child query. In the top_hits 
aggregation documentation it says '*By default the hits are sorted by the 
score of the main query*', but I'm not seeing that in the results for my 
query

{
  from: 0,
  size: 3,
  query: {
has_child: {
  score_mode: max,
  type: child_type,
  query: {
match: {
  myField: {
query: some text
  }
}
  }
}
  },
  aggs: {
replies: {
  terms: {
field: parent_type_id,
size: 3
  },
  aggs: {
topChildren: {
  top_hits: {
size: 1
  }
}
  }
}
  }
}

the has_child query returns three parent results with the following scores.

   - doc 1 = 0.83619833
   - doc 2 = 0.7210085
   - doc 3 = 0.7210085

The score for the top hits aggregations are:

   - first top hit aggregation = 0.29160267
   - second top hit aggregation  = 0.83619833
   - third top hit aggregation = 0.58320534

So the 'second top hit aggregation' should be returned first followed with 
aggregations with the score  0.7210085?





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0b6849ad-4308-4afe-a76b-80153620f74b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to increase memory

2014-08-18 Thread rookie7799
Hi, it's 1.3.2

On Monday, August 18, 2014 5:49:03 PM UTC-4, Jörg Prante wrote:

 What version of ES do you use?

 Jörg


 On Mon, Aug 18, 2014 at 9:42 PM, rookie7799 pavelb...@gmail.com 
 javascript: wrote:

 Hello there,

 We are having the same exact problem with a really resource hungry query:
 5 nodes with 16GB ES_HEAP_SIZE
 1.2 Billion records inside 1 index with 5 shards

 Whenever we start running an aggregate query the whole cluster breaks and 
 disconnects. Why can't it just not return results and simple give and error 
 without actually killing the entire cluster?

 Cheers!


 On Saturday, February 9, 2013 1:05:54 PM UTC-5, Igor Motov wrote:

 ES_HEAP_SIZE ES_MAX_MEM ES_MIN_MEM are environment variables. They need 
 to be specified on the command line. For example:

 ES_HEAP_SIZE=4g bin/elasticsearch -f

 To get JVM stats, you need to set jvm=true on stats request:

 curl -XGET 'http://localhost:9200/_cluster/nodes/stats?jvm=true;
 pretty=true'

 To understand how much memory you need, give it as much as you can, put 
 some load and monitor jvm.mem.heap_used in the output of the stats 
 command above. If this number ever goes and stays above 90% 
 of available heap it's typically a good indicator that you need more.

 There is a small Russian elasticsearch forum - 
 https://groups.google.com/forum/?fromgroups=#!forum/elasticsearch-ru

 On Saturday, February 9, 2013 12:57:04 PM UTC-5, Николай Измайлов wrote:

 In continuation of the topic https://github.com/
 elasticsearch/elasticsearch/issues/2636#issuecomment-13332877

 in continuation of the topic https://github.com/
 elasticsearch/elasticsearch/issues/2636#issuecomment-13332877
 On the page http://www.elasticsearch.org/guide/reference/setup/
 installation.html it is said that it is necessary to increase 
 ES_HEAP_SIZE ES_MAX_MEM ES_MIN_MEM, but I have not found this 
 configuration 
 then /etc/elasticsearch/elasticsearch.yml. Here's my cluster

 {
   cluster_name : elasticsearch,
   nodes : {
 VPjABUm-REmy24NQ_AkXDQ : {
   timestamp : 1360432148849,
   name : Sin,
   transport_address : inet[/ip:9300],
   hostname : Ubuntu-1204-precise-64-minimal,
   indices : {
 store : {
   size : 34.6gb,
   size_in_bytes : 37221752556,
   throttle_time : 0s,
   throttle_time_in_millis : 0
 },
 docs : {
   count : 58480,
   deleted : 4759
 },
 indexing : {
   index_total : 20,
   index_time : 1.7s,
   index_time_in_millis : 1748,
   index_current : 0,
   delete_total : 0,
   delete_time : 0s,
   delete_time_in_millis : 0,
   delete_current : 0
 },
 get : {
   total : 2,
   time : 5ms,
   time_in_millis : 5,
   exists_total : 0,
   exists_time : 0s,
   exists_time_in_millis : 0,
   missing_total : 2,
   missing_time : 5ms,
   missing_time_in_millis : 5,
   current : 0
 },
 search : {
   query_total : 1726375,
   query_time : 7.7m,
   query_time_in_millis : 462631,
   query_current : 0,
   fetch_total : 61663,
   fetch_time : 20.9s,
   fetch_time_in_millis : 20955,
   fetch_current : 0
 },
 cache : {
   field_evictions : 0,
   field_size : 0b,
   field_size_in_bytes : 0,
   filter_count : 5896,
   filter_evictions : 0,
   filter_size : 511.6kb,
   filter_size_in_bytes : 523944,
   bloom_size : 22.1kb,
   bloom_size_in_bytes : 22640,
   id_cache_size : 0b,
   id_cache_size_in_bytes : 0
 },
 merges : {
   current : 0,
   current_docs : 0,
   current_size : 0b,
   current_size_in_bytes : 0,
   total : 0,
   total_time : 0s,
   total_time_in_millis : 0,
   total_docs : 0,
   total_size : 0b,
   total_size_in_bytes : 0
 },
 refresh : {
   total : 15,
   total_time : 143ms,
   total_time_in_millis : 143
 },
 flush : {
   total : 25,
   total_time : 3.2s,
   total_time_in_millis : 3205
 }
   }
 }
   }
 }


 As understand how much I need to allocate memory for elasticsearch and 
 in General the description for each of the parameters.

 there is a Russian community ?

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/12846c2e-e777-4812-bce2-d6c97a30c352%40googlegroups.com
  
 

How to safely migrate from one mount to another mount in Elasticsearch to store the data

2014-08-18 Thread shriyansh jain
Hi,

I have a Elasticsearch Cluster of 2 nodes. I have configured them to store 
data at the location which is /auto/share. I want to point one of the two 
nodes in the cluster to some other location to store the data say /auto/foo.
What would be the best way of achieving the above task without loosing any 
data.? And is it possible to do that without loosing any data.?

Thank you,
Shriyansh

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

2014-08-18 Thread Mark Walkom
Do you want to copy the existing data in /auto/share to /auto/foo, or start
with no data?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 19 August 2014 08:23, shriyansh jain shriyanshaj...@gmail.com wrote:

 Hi,

 I have a Elasticsearch Cluster of 2 nodes. I have configured them to store
 data at the location which is /auto/share. I want to point one of the two
 nodes in the cluster to some other location to store the data say /auto/foo.
 What would be the best way of achieving the above task without loosing any
 data.? And is it possible to do that without loosing any data.?

 Thank you,
 Shriyansh

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624aPBZLB0i-y-_JFJFJzVVLiZBz3VDiJUYTt%2BduUZ-Br6Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

2014-08-18 Thread Mark Walkom
If you want no data in /auto/foo then just create the directory, give it
the right permissions and then update the config to point to it.
It's the same process you did for /auto/share.


Do you have replicas set on your indexes?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 19 August 2014 08:32, shriyansh jain shriyanshaj...@gmail.com wrote:

 I would prefer with no data in /auto/foo.? But would like to go with way,
 which is efficient and more reliable.


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

 Do you want to copy the existing data in /auto/share to /auto/foo, or
 start with no data?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com wrote:

 Hi,

 I have a Elasticsearch Cluster of 2 nodes. I have configured them to
 store data at the location which is /auto/share. I want to point one of the
 two nodes in the cluster to some other location to store the data say
 /auto/foo.
 What would be the best way of achieving the above task without loosing
 any data.? And is it possible to do that without loosing any data.?

 Thank you,
 Shriyansh

   --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bMQP2HW4UxbV%3DX81fN-M0r7diZ9HVyZFRDcf9Q0P6mrA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

2014-08-18 Thread shriyansh jain

Yes, I have set *index.number_of_replicas: 1*. If I just point one of the 2 
nodes to some other location, wont it loose the data stored by that node.?

Thank you,
Shriyansh

On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

 If you want no data in /auto/foo then just create the directory, give it 
 the right permissions and then update the config to point to it.
 It's the same process you did for /auto/share.


 Do you have replicas set on your indexes?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com 
 javascript: wrote:

 I would prefer with no data in /auto/foo.? But would like to go with way, 
 which is efficient and more reliable.


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

 Do you want to copy the existing data in /auto/share to /auto/foo, or 
 start with no data?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com
  

 On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com wrote:

 Hi,

 I have a Elasticsearch Cluster of 2 nodes. I have configured them to 
 store data at the location which is /auto/share. I want to point one of 
 the 
 two nodes in the cluster to some other location to store the data say 
 /auto/foo.
 What would be the best way of achieving the above task without loosing 
 any data.? And is it possible to do that without loosing any data.?

 Thank you,
 Shriyansh

   -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%
 40googlegroups.com 
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eb602b14-4d3e-430c-93d8-935da98af66a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

2014-08-18 Thread shriyansh jain

Yes, I have set *index.number_of_replicas: 1*. If I just point one of the 2 
nodes to some other location, wont it lose the data stored by that node.?

Thank you,
Shriyansh

On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

 If you want no data in /auto/foo then just create the directory, give it 
 the right permissions and then update the config to point to it.
 It's the same process you did for /auto/share.


 Do you have replicas set on your indexes?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com 
 javascript: wrote:

 I would prefer with no data in /auto/foo.? But would like to go with way, 
 which is efficient and more reliable.


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

 Do you want to copy the existing data in /auto/share to /auto/foo, or 
 start with no data?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com
  

 On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com wrote:

 Hi,

 I have a Elasticsearch Cluster of 2 nodes. I have configured them to 
 store data at the location which is /auto/share. I want to point one of 
 the 
 two nodes in the cluster to some other location to store the data say 
 /auto/foo.
 What would be the best way of achieving the above task without loosing 
 any data.? And is it possible to do that without loosing any data.?

 Thank you,
 Shriyansh

   -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%
 40googlegroups.com 
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

2014-08-18 Thread Mark Walkom
If you point the instance to a new data location then yes, it will startup
with no data, but it won't lose the data completely as it will still be
located in your original /auto/share directory.

However given you have replicas set what will happen is when the node
starts up pointing to the new location it will simply start to copy the
data from the other node so that you fulfil your replica requirements.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 19 August 2014 08:58, shriyansh jain shriyanshaj...@gmail.com wrote:


 Yes, I have set *index.number_of_replicas: 1*. If I just point one of the
 2 nodes to some other location, wont it lose the data stored by that node.?


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

 If you want no data in /auto/foo then just create the directory, give it
 the right permissions and then update the config to point to it.
 It's the same process you did for /auto/share.


 Do you have replicas set on your indexes?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com wrote:

 I would prefer with no data in /auto/foo.? But would like to go with
 way, which is efficient and more reliable.


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

 Do you want to copy the existing data in /auto/share to /auto/foo, or
 start with no data?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com wrote:

 Hi,

 I have a Elasticsearch Cluster of 2 nodes. I have configured them to
 store data at the location which is /auto/share. I want to point one of 
 the
 two nodes in the cluster to some other location to store the data say
 /auto/foo.
 What would be the best way of achieving the above task without loosing
 any data.? And is it possible to do that without loosing any data.?

 Thank you,
 Shriyansh

   --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40goo
 glegroups.com
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YSXt_0L4W%3DNfA_3PjxyMz%2BXKZi0SS2noQQ3qdb0pOJWw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to get char_filter to work?

2014-08-18 Thread Ivan Brusic
Sorry if I have not replied sooner, but I was on vacation.

I would use the two fields solution, especially since you simply cannot
store a stripped version. The source field is compressed, so the additional
index size is content dependent. Never used highlighting, so I cannot
recommend alternative approaches.

I use jsoup to strip HTML before the data reaches Elasticsearch. Not sure
if it is the best, but I have been using it for years.

Cheers,

Ivan


On Wed, Aug 13, 2014 at 8:16 AM, IronMan2014 sabdall...@gmail.com wrote:

 Ivan,

 A followup question, As I mentioned earlier storing html and applying
 char-filter doesn't really work especially with highlighted fields coming
 back with weird html display.
 So, I am thinking stripping html before indexing, so no html in index and
 source, but I will add an extra field like html_content which meant to
 store the html version and not be indexed.
 Do you see any problems with my approach? I see one like big index size.
 What do you recommend for an ideal solution? I am still confused as I
 thought this would be a common problem?


 On Friday, August 8, 2014 8:16:09 PM UTC-4, IronMan wrote:

 Thanks again. I wasn't expecting it to remove what's between the tags. I
 believe I understand the behavior and maybe its the case where I was greedy
 and expecting ElasticSearch to do it all.
 Here is a scenario that I was looking for: Assume I am looking to get an
 excerpt of text (Extracted text from a document), Elastic Search query will
 give me excerpt with html tags, but the tags are out of context, so I would
 have liked to be to display this excerpt with no html tags, I know I can
 probably strip the tags after the fact, but that's what I was trying to
 avoid.  In other words, in a perfect world, I would have liked 2 versions
 of the document, the original html one and another stripped one. When I
 need to query things like excerpts, I would query the stripped one, and
 when I needed the html, I would query the source. Hopefully I didn't make
 this more confusing.

 On Friday, August 8, 2014 4:58:03 PM UTC-4, Ivan Brusic wrote:

 The tokens that appear in the analyze API are the ones that are put into
 the inverted index. When you search for one of the terms that is not an
 HTML tag, there will be a match. What I don't understand after reading in
 detail your original, is exactly what behavior you are expecting.

 You indexed the phrase
 htmltrying out bElasticsearch/b, This is an html test/html

 but you expected a query for the term html to not match. However, the
 work html is clearly in the content. The html stripper will not remove
 the contents in between the tags, just the tags themselve. The analyze API
 should show you the correct term.

 Lucene has more control over what information you can retrieve, but the
 only way to get the analyzed token stream back from Elasticsearch is to use
 the analyze API on the field. Most people do not want an analyzed token
 stream, just the original field.

 --
 Ivan


 On Fri, Aug 8, 2014 at 12:01 PM, IronMike sabda...@gmail.com wrote:

 Also, Here is a link for someone who had the same problem, I am not
 sure if there was a final answer to that one. http://grokbase.com/t/gg/
 elasticsearch/126r4kv8tx/problem-with-standard-html-strip,
 I have to admit that I am a bit confused now about this topic. I
 understand analyzers will tokenize the sentence and strip html in the case
 of the html_strip, and _analyze works fine using the analyzer, what I am
 failing to understand, is how can I get the results of these tokens. Isn't
 the whole idea to be able to search for them tokens eventually?

 If not, whats the solution of what I would think is a common scenario,
 having to index html documents, where html tags don't need to be indexed,
 while keeping the original html for presentational purpose? Any ideas
 (Besides having to strip html tags manually before indexing?


 On Friday, August 8, 2014 1:02:07 PM UTC-4, IronMike wrote:

 Thanks for explaining. So, is there a way to be able to get non html
 from the index? I thought I read that it was possible to index without the
 html tags while keeping source intact. So, how would I get at the index
 with non html tags if you will?

 On Friday, August 8, 2014 12:52:37 PM UTC-4, Ivan Brusic wrote:

 The field is derived from the source and not generated from the
 tokens.

 If we indexed the sentence The quick brown foxes jumped over the
 lazy dogs with the english analyzer, the tokens would be

 http://localhost:9200/_analyze?text=The%20quick%20brown%
 20foxes%20jumped%20over%20the%20lazy%20dogsanalyzer=english

 quick brown fox jump over lazi dog

 After applying stopwords and stemming, the tokens do not form a
 sentence that looks like the original.

 --
 Ivan


 On Fri, Aug 8, 2014 at 9:42 AM, IronMike sabda...@gmail.com wrote:

 Ivan,

 The search results I am showing is for the field title not for the
 source. I thought I could query the field not the source and look at it
 

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

2014-08-18 Thread Mark Walkom
Why do you want to do this if you are worried about data loss?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 19 August 2014 11:50, shriyansh jain shriyanshaj...@gmail.com wrote:

 As you mentioned the node will not lose the data completely, is there any
 possibility that it will lose some data.?

 Thank you,
 Shriyansh

 On Monday, August 18, 2014 4:17:54 PM UTC-7, Mark Walkom wrote:

 If you point the instance to a new data location then yes, it will
 startup with no data, but it won't lose the data completely as it will
 still be located in your original /auto/share directory.

 However given you have replicas set what will happen is when the node
 starts up pointing to the new location it will simply start to copy the
 data from the other node so that you fulfil your replica requirements.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:58, shriyansh jain shriyan...@gmail.com wrote:


 Yes, I have set *index.number_of_replicas: 1*. If I just point one of
 the 2 nodes to some other location, wont it lose the data stored by that
 node.?


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

 If you want no data in /auto/foo then just create the directory, give
 it the right permissions and then update the config to point to it.
 It's the same process you did for /auto/share.


 Do you have replicas set on your indexes?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com wrote:

 I would prefer with no data in /auto/foo.? But would like to go with
 way, which is efficient and more reliable.


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

 Do you want to copy the existing data in /auto/share to /auto/foo, or
 start with no data?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com wrote:

 Hi,

 I have a Elasticsearch Cluster of 2 nodes. I have configured them to
 store data at the location which is /auto/share. I want to point one of 
 the
 two nodes in the cluster to some other location to store the data say
 /auto/foo.
 What would be the best way of achieving the above task without
 loosing any data.? And is it possible to do that without loosing any 
 data.?

 Thank you,
 Shriyansh

   --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa
 9-4f6d-86b9-41b2059ab67f%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40goo
 glegroups.com
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/13131192-405a-43b9-ab56-62ff894c8237%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/13131192-405a-43b9-ab56-62ff894c8237%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

2014-08-18 Thread shriyansh jain
Just to make sure if /auto/share goes down I have data in /auto/foo.

Thanks,
Shriyansh

On Monday, August 18, 2014 6:55:59 PM UTC-7, Mark Walkom wrote:

 Why do you want to do this if you are worried about data loss?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com
  

 On 19 August 2014 11:50, shriyansh jain shriyan...@gmail.com 
 javascript: wrote:

 As you mentioned the node will not lose the data completely, is there any 
 possibility that it will lose some data.?

 Thank you,
 Shriyansh

 On Monday, August 18, 2014 4:17:54 PM UTC-7, Mark Walkom wrote:

 If you point the instance to a new data location then yes, it will 
 startup with no data, but it won't lose the data completely as it will 
 still be located in your original /auto/share directory.

 However given you have replicas set what will happen is when the node 
 starts up pointing to the new location it will simply start to copy the 
 data from the other node so that you fulfil your replica requirements.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:58, shriyansh jain shriyan...@gmail.com wrote:


 Yes, I have set *index.number_of_replicas: 1*. If I just point one of 
 the 2 nodes to some other location, wont it lose the data stored by that 
 node.?


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

 If you want no data in /auto/foo then just create the directory, give 
 it the right permissions and then update the config to point to it.
 It's the same process you did for /auto/share.


 Do you have replicas set on your indexes?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com wrote:

 I would prefer with no data in /auto/foo.? But would like to go with 
 way, which is efficient and more reliable.


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

 Do you want to copy the existing data in /auto/share to /auto/foo, 
 or start with no data?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com
  

 On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com 
 wrote:

 Hi,

 I have a Elasticsearch Cluster of 2 nodes. I have configured them 
 to store data at the location which is /auto/share. I want to point 
 one of 
 the two nodes in the cluster to some other location to store the data 
 say 
 /auto/foo.
 What would be the best way of achieving the above task without 
 loosing any data.? And is it possible to do that without loosing any 
 data.?

 Thank you,
 Shriyansh

   -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, 
 send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa
 9-4f6d-86b9-41b2059ab67f%40googlegroups.com 
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, 
 send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40goo
 glegroups.com 
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%
 40googlegroups.com 
 https://groups.google.com/d/msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

2014-08-18 Thread shriyansh jain
To make sure if /auto/share goes down, I have data in /auto/foo. And I am 
short of space on /auto/share. Mainly bcz of these 2 reasons.

Thanks,
Shriyansh

On Monday, August 18, 2014 6:55:59 PM UTC-7, Mark Walkom wrote:

 Why do you want to do this if you are worried about data loss?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com
  

 On 19 August 2014 11:50, shriyansh jain shriyan...@gmail.com 
 javascript: wrote:

 As you mentioned the node will not lose the data completely, is there any 
 possibility that it will lose some data.?

 Thank you,
 Shriyansh

 On Monday, August 18, 2014 4:17:54 PM UTC-7, Mark Walkom wrote:

 If you point the instance to a new data location then yes, it will 
 startup with no data, but it won't lose the data completely as it will 
 still be located in your original /auto/share directory.

 However given you have replicas set what will happen is when the node 
 starts up pointing to the new location it will simply start to copy the 
 data from the other node so that you fulfil your replica requirements.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:58, shriyansh jain shriyan...@gmail.com wrote:


 Yes, I have set *index.number_of_replicas: 1*. If I just point one of 
 the 2 nodes to some other location, wont it lose the data stored by that 
 node.?


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

 If you want no data in /auto/foo then just create the directory, give 
 it the right permissions and then update the config to point to it.
 It's the same process you did for /auto/share.


 Do you have replicas set on your indexes?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com wrote:

 I would prefer with no data in /auto/foo.? But would like to go with 
 way, which is efficient and more reliable.


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

 Do you want to copy the existing data in /auto/share to /auto/foo, 
 or start with no data?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com
  

 On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com 
 wrote:

 Hi,

 I have a Elasticsearch Cluster of 2 nodes. I have configured them 
 to store data at the location which is /auto/share. I want to point 
 one of 
 the two nodes in the cluster to some other location to store the data 
 say 
 /auto/foo.
 What would be the best way of achieving the above task without 
 loosing any data.? And is it possible to do that without loosing any 
 data.?

 Thank you,
 Shriyansh

   -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, 
 send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa
 9-4f6d-86b9-41b2059ab67f%40googlegroups.com 
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, 
 send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40goo
 glegroups.com 
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%
 40googlegroups.com 
 https://groups.google.com/d/msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

2014-08-18 Thread Mark Walkom
This is why you have replicas, they give you redundancy at a higher level
that the filesystem,
If you are still concerned then you should add another node and increase
your replicas.

Playing around on the FS to create replicas is only extra management
overhead and likely to end up causing more problems than it's worth.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 19 August 2014 11:59, shriyansh jain shriyanshaj...@gmail.com wrote:

 Just to make sure if /auto/share goes down I have data in /auto/foo.

 Thanks,
 Shriyansh

 On Monday, August 18, 2014 6:55:59 PM UTC-7, Mark Walkom wrote:

 Why do you want to do this if you are worried about data loss?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 11:50, shriyansh jain shriyan...@gmail.com wrote:

 As you mentioned the node will not lose the data completely, is there
 any possibility that it will lose some data.?

 Thank you,
 Shriyansh

 On Monday, August 18, 2014 4:17:54 PM UTC-7, Mark Walkom wrote:

 If you point the instance to a new data location then yes, it will
 startup with no data, but it won't lose the data completely as it will
 still be located in your original /auto/share directory.

 However given you have replicas set what will happen is when the node
 starts up pointing to the new location it will simply start to copy the
 data from the other node so that you fulfil your replica requirements.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:58, shriyansh jain shriyan...@gmail.com wrote:


 Yes, I have set *index.number_of_replicas: 1*. If I just point one of
 the 2 nodes to some other location, wont it lose the data stored by that
 node.?


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

 If you want no data in /auto/foo then just create the directory, give
 it the right permissions and then update the config to point to it.
 It's the same process you did for /auto/share.


 Do you have replicas set on your indexes?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com wrote:

 I would prefer with no data in /auto/foo.? But would like to go with
 way, which is efficient and more reliable.


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

 Do you want to copy the existing data in /auto/share to /auto/foo,
 or start with no data?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com
  wrote:

 Hi,

 I have a Elasticsearch Cluster of 2 nodes. I have configured them
 to store data at the location which is /auto/share. I want to point 
 one of
 the two nodes in the cluster to some other location to store the data 
 say
 /auto/foo.
 What would be the best way of achieving the above task without
 loosing any data.? And is it possible to do that without loosing any 
 data.?

 Thank you,
 Shriyansh

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa
 9-4f6d-86b9-41b2059ab67f%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e2
 3-4e6b-8cf5-9a8bb31c4328%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40goo
 glegroups.com
 

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

2014-08-18 Thread Mark Walkom
Apart from replica's, that's really outside the scope of what ES provides.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 19 August 2014 12:12, shriyansh jain shriyanshaj...@gmail.com wrote:

 I got your point sir, but if my entire /auto/share goes down. Then I wont
 have any chance to recover the data in /auto/share.
 Is there any other way to recover the data.?

 Thanks,
 Shriyansh

 On Monday, August 18, 2014 7:03:34 PM UTC-7, Mark Walkom wrote:

 This is why you have replicas, they give you redundancy at a higher level
 that the filesystem,
 If you are still concerned then you should add another node and increase
 your replicas.

 Playing around on the FS to create replicas is only extra management
 overhead and likely to end up causing more problems than it's worth.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 11:59, shriyansh jain shriyan...@gmail.com wrote:

 Just to make sure if /auto/share goes down I have data in /auto/foo.

 Thanks,
 Shriyansh

 On Monday, August 18, 2014 6:55:59 PM UTC-7, Mark Walkom wrote:

 Why do you want to do this if you are worried about data loss?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 11:50, shriyansh jain shriyan...@gmail.com wrote:

 As you mentioned the node will not lose the data completely, is there
 any possibility that it will lose some data.?

 Thank you,
 Shriyansh

 On Monday, August 18, 2014 4:17:54 PM UTC-7, Mark Walkom wrote:

 If you point the instance to a new data location then yes, it will
 startup with no data, but it won't lose the data completely as it will
 still be located in your original /auto/share directory.

 However given you have replicas set what will happen is when the node
 starts up pointing to the new location it will simply start to copy the
 data from the other node so that you fulfil your replica requirements.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:58, shriyansh jain shriyan...@gmail.com wrote:


 Yes, I have set *index.number_of_replicas: 1*. If I just point one
 of the 2 nodes to some other location, wont it lose the data stored by 
 that
 node.?


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

 If you want no data in /auto/foo then just create the directory,
 give it the right permissions and then update the config to point to 
 it.
 It's the same process you did for /auto/share.


 Do you have replicas set on your indexes?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com
 wrote:

 I would prefer with no data in /auto/foo.? But would like to go
 with way, which is efficient and more reliable.


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

 Do you want to copy the existing data in /auto/share to
 /auto/foo, or start with no data?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com
 wrote:

 Hi,

 I have a Elasticsearch Cluster of 2 nodes. I have configured
 them to store data at the location which is /auto/share. I want to 
 point
 one of the two nodes in the cluster to some other location to store 
 the
 data say /auto/foo.
 What would be the best way of achieving the above task without
 loosing any data.? And is it possible to do that without loosing 
 any data.?

 Thank you,
 Shriyansh

 --
 You received this message because you are subscribed to the
 Google Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from
 it, send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.
 com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.
 com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%
 40googlegroups.com
 

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

2014-08-18 Thread shriyansh jain
Thank you for helping me out. I really appreciate it.

Regards,
Shriyansh

On Monday, August 18, 2014 7:23:50 PM UTC-7, Mark Walkom wrote:

 Apart from replica's, that's really outside the scope of what ES provides.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com
  

 On 19 August 2014 12:12, shriyansh jain shriyan...@gmail.com 
 javascript: wrote:

 I got your point sir, but if my entire /auto/share goes down. Then I wont 
 have any chance to recover the data in /auto/share. 
 Is there any other way to recover the data.?

 Thanks,
 Shriyansh

 On Monday, August 18, 2014 7:03:34 PM UTC-7, Mark Walkom wrote:

 This is why you have replicas, they give you redundancy at a higher 
 level that the filesystem,
 If you are still concerned then you should add another node and increase 
 your replicas.

 Playing around on the FS to create replicas is only extra management 
 overhead and likely to end up causing more problems than it's worth.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 11:59, shriyansh jain shriyan...@gmail.com wrote:

 Just to make sure if /auto/share goes down I have data in /auto/foo.

 Thanks,
 Shriyansh

 On Monday, August 18, 2014 6:55:59 PM UTC-7, Mark Walkom wrote:

 Why do you want to do this if you are worried about data loss?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 11:50, shriyansh jain shriyan...@gmail.com wrote:

 As you mentioned the node will not lose the data completely, is there 
 any possibility that it will lose some data.?

 Thank you,
 Shriyansh

 On Monday, August 18, 2014 4:17:54 PM UTC-7, Mark Walkom wrote:

 If you point the instance to a new data location then yes, it will 
 startup with no data, but it won't lose the data completely as it will 
 still be located in your original /auto/share directory.

 However given you have replicas set what will happen is when the 
 node starts up pointing to the new location it will simply start to 
 copy 
 the data from the other node so that you fulfil your replica 
 requirements.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:58, shriyansh jain shriyan...@gmail.com 
 wrote:


 Yes, I have set *index.number_of_replicas: 1*. If I just point one 
 of the 2 nodes to some other location, wont it lose the data stored by 
 that 
 node.?


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

 If you want no data in /auto/foo then just create the directory, 
 give it the right permissions and then update the config to point to 
 it.
 It's the same process you did for /auto/share.


 Do you have replicas set on your indexes?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com 
 wrote:

 I would prefer with no data in /auto/foo.? But would like to go 
 with way, which is efficient and more reliable.


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

 Do you want to copy the existing data in /auto/share to 
 /auto/foo, or start with no data?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com 
 wrote:

 Hi,

 I have a Elasticsearch Cluster of 2 nodes. I have configured 
 them to store data at the location which is /auto/share. I want to 
 point 
 one of the two nodes in the cluster to some other location to 
 store the 
 data say /auto/foo.
 What would be the best way of achieving the above task without 
 loosing any data.? And is it possible to do that without loosing 
 any data.?

 Thank you,
 Shriyansh

 -- 
 You received this message because you are subscribed to the 
 Google Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from 
 it, send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.
 com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%
 40googlegroups.com 
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


 -- 
 You received this message because you are subscribed to the 
 Google Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, 
 send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.
 

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

2014-08-18 Thread shriyansh jain
I would like to know one more thing, what would be steps if I want to copy 
the data from /auto/share to /auto/foo for a particular node.?

Thanks,
Shriyansh

On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

 Do you want to copy the existing data in /auto/share to /auto/foo, or 
 start with no data?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com
  

 On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com 
 javascript: wrote:

 Hi,

 I have a Elasticsearch Cluster of 2 nodes. I have configured them to 
 store data at the location which is /auto/share. I want to point one of the 
 two nodes in the cluster to some other location to store the data say 
 /auto/foo.
 What would be the best way of achieving the above task without loosing 
 any data.? And is it possible to do that without loosing any data.?

 Thank you,
 Shriyansh

   -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c427919f-b8f1-4286-9302-0869d2aa7b5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using a char_filter in combination with a lowercase filter

2014-08-18 Thread Ivan Brusic
Char filters are applied before the text is tokenized, and therefore they
are applied before the normal filters are used, which is why they are a
separate class of filter. With Lucene, the order is:

char filters - tokenizer - filters

Have you looked into the ICU analyzer?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-icu-plugin.html

I have no idea how well it works with Dutch.

Cheers,

Ivan


On Mon, Aug 18, 2014 at 2:14 AM, Matthias Hogerheijde 
matthias.hogerhei...@goabout.com wrote:

 Hi,

 We're using Elasticsearch with an Analyzer to map the `y` character to
 `ij`, (*char_fitler* named char_mapper) since in Dutch these two are
 somewhat interchangeable. We're also using a *lowercase filter*.

 This is the configuration:

 {
   analysis: {
 analyzer: {
   index: {
 type: custom,
 tokenizer: standard,
 filter: [
   lowercase,
   synonym_twoway,
   standard,
   asciifolding
 ],
 char_filter: [
   char_mapper
 ]
   },
   index_prefix: {
 type: custom,
 tokenizer: standard,
 filter: [
   lowercase,
   synonym_twoway,
   standard,
   asciifolding,
   prefixes
 ],
 char_filter: [
   char_mapper
 ]
   },
   search: {
 alias: [
   default
 ],
 type: custom,
 tokenizer: standard,
 filter: [
   lowercase,
   synonym,
   synonym_twoway,
   standard,
   asciifolding
 ],
 char_filter: [
   char_mapper
 ]
   },
   postal_code: {
 tokenizer: keyword,
 filter: [
   lowercase
 ]
   }
 },
 tokenizer: {
   standard: {
 stopwords: [


 ]
   }
 },
 filter: {
   synonym: {
 type: synonym,
 synonyms: [
   st = sint,
   jp = jan pieterszoon,
   mh = maarten harpertszoon
 ]
   },
   synonym_twoway: {
 type: synonym,
 synonyms: [
   den haag, s gravenhage,
   den bosch, s hertogenbosch
 ]
   },
   prefixes: {
 type: edgeNGram,
 side: front,
 min_gram: 1,
 max_gram: 30
   }
 },
 char_filter: {
   char_mapper: {
 type: mapping,
 mappings: [
   y = ij
 ]
   }
 }
   }
 }

 When indexing cities, we're using this mapping:

 {
   properties: {
 city: {
   type: multi_field,
   fields: {
 city: {
   type: string
 },
 prefix: {
   type: string,
   boost: 0.5,
   index_analyzer: index_prefix
 }
   }
 },
 province_code: {
   type: string
 },
 unique_name: {
   type: boolean
 },
 point: {
   type: geo_point
 },
 search_terms: {
   type: multi_field,
   fields: {
 search_terms: {
   type: string
 },
 prefix: {
   boost: 0.5,
   index_analyzer: index_prefix,
   type: string
 }
   }
 }
   },
   search_analyzer: search,
   index_analyzer: index
 }

 When we index all the (Dutch) cities from our data-source, there are
 cities starting with both `IJ` and `Y`. (for example, these citiy names
 exist: *IJssel*, *IJsselstein*, *Yerseke* and *Ysselsteyn.*) It seems
 that these characters are not lowercased before the char_mapping is
 applied.

 Querying the index, results in

 /top/city/_search?q=ijsselstein - works, returns the document for
 IJsselstein
 /top/city/_search?q=Ijsselstein - works, returns the document for
 IJsselstein
 /top/city/_search?q=yerseke - *doesn't *work, returns nothing
 /top/city/_search?q=Yerseke - *does *work, returns the document for
 Yerseke
 /top/city/_search?q=YsselsteYn - *doesn't *work, returns nothing
 /top/city/_search?q=Ysselsteyn - *does *work, returns the document for
 Ysselsteyn

 Changing the case of any other letter doesn't affect the results.

 I've worked around this issue by adding the mapping Y = ij, i.e.:

 char_filter: {
   char_mapper: {
 type: mapping,
 mappings: [
   y = ij,
   Y = ij
 ]
   }
 }

 This solves the problem, but I'd rather see that the lowercase filter is
 applied before the mapping, or, that I can make the order explicit. Is
 there any stance on this issue? Or is this intended behaviour?

 Regards,
 Matthias Hogerheijde



  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c60de452-2a3f-42f7-a677-956f81ecec17%40googlegroups.com
 

Re: A few questions about node types + usage

2014-08-18 Thread Mark Walkom
Master, data and client are really just abstractions of different
combinations of node.data and node.master values.

A node.master=true, node.data=false can handle both cluster management and
queries.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 18 August 2014 22:49, Alex alex.mon...@gmail.com wrote:

 Hello again Mark,

 Thanks for your response. Your answers really are very helpful.

 As with our previous conversation
 https://groups.google.com/d/topic/elasticsearch/ZouS4NVsTJw/discussion I
 am confused about how to make a client node also be master eligible. This
 is what I posted there, I would really like some help understanding this:

 I've done more investigating and it seems that a Client (AKA Query) node
 cannot also be a Master node. As it says here http://www.elasticsearch.
 org/guide/en/elasticsearch/reference/current/modules-
 discovery-zen.html#master-election

 *Nodes can be excluded from becoming a master by setting node.master to
 false. Note, once a node is a client node (node.client set to true), it
 will not be allowed to become a master (node.master is automatically set to
 false).*

 And from the elasticsearch.yml config file it says:












 *# 2. You want this node to only serve as a master: to not store any data
 and # to have free resources. This will be the coordinator of your
 cluster.  # #node.master: true #node.data: false # # 3. You want this node
 to be neither master nor data node, but # to act as a search load
 balancer (fetching data from nodes, # aggregating results, etc.)
 # #node.master: false #node.data: false*

 So I'm wondering how exactly you set up your client nodes to also be
 master nodes. It seems like a master node can only either be purely a
 master or master + data.

 Perhaps you could show the relevant parts of one of your client node's
 config?

 Many thanks, Alex

 On Saturday, 16 August 2014 01:04:37 UTC+1, Mark Walkom wrote:

 1 - Up to you. We use the http output and then just use a round robin A
 record to our 3 masters.
 2 - They are routed but it makes more sense to specify.
 3 - You're right, but most people only use 1 or 2 masters which is why
 they get recommended to have at least 3.
 4 - That sounds like a lot. We use masters that double as clients and
 they only have 8GB, our use sounds similar and we don't have issues.

 I wouldn't bother with 3 client only nodes to start, use them as master
 and client and then if you find you are hitting memory issues due to
 queries you can re-evaluate things.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 15 August 2014 20:11, Alex alex@gmail.com wrote:

 Bump. Any help? Thanks

 On Wednesday, 13 August 2014 12:10:14 UTC+1, Alex wrote:

 Hello I would like some clarification about node types and their usage.

 We will have 3 client nodes and 6 data nodes. The 6 1TB data nodes can
 also be masters (discovery.zen.minimum_master_nodes set to 4). We will
 use Logstash and Kibana. Kibana will be used 24/7 by between a couple and
 handfuls of people.

 Some questions:

1. Should incoming Logstash write requests be sent to the cluster
in general (using the *cluster* setting in the *elasticsearch*
output) or specifically to the client nodes or to the data nodes (via 
 load
balancer)? I am unsure what kind of node is best for handling writes.

2. If client nodes exist in the cluster are Kibana requests
automatically routed to them? Do I need to somehow specify to Kibana 
 which
nodes to contact?

3. I have heard different information about master nodes and the
minimum_master_node setting. I've heard that you should have a odd 
 number
of master nodes but I fail to see why the parity of the number of 
 masters
matters as long as minimum_master_node is set to at least N/2 + 1. Does 
 it
really need to be odd?

4. I have been advised that the client nodes will use huge amount
of memory (which makes sense due to the nature of the Kibana facet
queries). 64GB per client node was recommended but I have no idea if 
 that
sounds right or not. I don't have the ability to actually test it right 
 now
so any more guidance on that would be helpful.

 I'd be so grateful to hear from you even if you only know something
 about one of my queries.

 Thank you for your time,
 Alex

  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/70b16a1e-319c-4f7c-b129-b68258b3652f%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/70b16a1e-319c-4f7c-b129-b68258b3652f%40googlegroups.com?utm_medium=emailutm_source=footer