Re: How can I make this search requirement work?

2014-07-16 Thread vineeth mohan
Hello Mooky ,

Elasticsearch is not any domain specific and hence wont take out these
financial terms.
You will need to write your own analyzer to facilitate this function.

Thanks
   Vineeth


On Wed, Jul 16, 2014 at 4:17 PM, mooky  wrote:

> And it works a treat. Thanks.
>
> It leads me to think that it would be very useful to use with a series of
> specialist (special-case) analyzers in conjunction with the standard
> analyzer.
>
> Back to my original example - "0# (99.995%)" - what I really want is
> something that will extract "99.995%".
> The standard analyzer will extract "99.995" (and the rest of the text),
> the whitespace analyzer will extract "(99.995%)".
>
> Does a financial/numeric/accounting analyzer already exist? ie Something
> that extracts "99.995%" or "$44.5665" or "-45bps" ?
>
> -M
>
>
>
>
>
>
> On Tuesday, 15 July 2014 18:58:46 UTC+1, mooky wrote:
>>
>> Thanks. That looks interesting!
>>
>>
>> On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:
>>>
>>> Hello Mooky ,
>>>
>>> You can apply multiple analyzers to a field -https://github.com/yakaz/
>>> elasticsearch-analysis-combo/
>>>
>>> So you can add all your analyzer here and apply it.
>>>
>>> Thanks
>>>   Vineeth
>>>
>>>
>>> On Tue, Jul 15, 2014 at 8:10 PM, mooky  wrote:
>>>
 I have a bit of an odd requirement in so far as analyzer is concerned.
 Wondering if anyone has any tips/suggestions.
 I have an item I am indexing (grade) that has a property (name) whose
 value can be "0# (99.995%)".
 I am doing a prefix search on _all.
 I want users to be able to search using 99 or 99.9 or 99.995 or
 99.995%.
 I also want the user to be able to copy-paste "0# (99.995%)" and it
 should work.

 I am currently using the whitespace analyzer - which works for many of
 my cases except the tricky one above.
 99.995 doesnt work.
 But "(99.995" does. Because obviously after whitespace tokenization,
 the token begins with (.
 I could filter out the "(" and ")" characters. But then "0# (99.995%)"
 wont work.
 Does anyone have some different suggestions?

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%
 40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/78a267ff-869e-462d-80c4-057c907e0324%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kg7TRG%3DX_%2B7tAueFaZ8pUYXbHrJhFZMVQaYcQyTicenQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: High memory usage on dedicated master nodes

2014-07-16 Thread smonasco
Maybe I'm missing something but if you give java mms em it will use it if only 
to store garbage.  The trough after a garbage collection is usually more 
indicative of what is actually in use.

This looks like a CMS/parnew setup.  Parnew is fast and low blocking, but 
leaves stuff behind.  CMS blocks but is really good at taking out the garbage.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0e7c621e-3e85-4bc3-ac0f-a1971cb097f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I make this search requirement work?

2014-07-16 Thread smonasco
A little late to the party but I would have used a custom index analyzer with 
lowercase, pattern, edgengram and a search analyzer of lowercase, pattern  
(maybe you have to flip lowercase and pattern)

With the pattern tokenizer you can specify a regex.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/693ed0c3-2998-4da4-b30a-c7bf9f311770%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


RE: best practice wanted for huge number of index time serial data

2014-07-16 Thread Wang Yong
Thank you Mark, if I use daily index, I have to specify multiple indexes based 
on the time range. That will make my service a little more complicate. 

So I am wondering, even if I put all data in one huge index, as long as I limit 
the time range in my query, it looks like es will locate the data as quickly as 
I do it in a much smaller daily index, cause es will not need to search through 
the whole index, just need to locate the data first by the time range specified 
in the query. Is that true?

 

Alan

 

From: elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] On 
Behalf Of Mark Walkom
Sent: 2014年7月14日 10:03
To: elasticsearch@googlegroups.com
Subject: Re: best practice wanted for huge number of index time serial data

 

This is pretty standard for logstash type data.

 

Use daily indexes, don't use TTL.




Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com  
web: www.campaignmonitor.com  

 

On 14 July 2014 11:40, LiMac mailto:cnwangy...@gmail.com> > wrote:

Hi folks,

 

I am trying to index a huge number of time serial data. The total number will 
be 5k docs for one second which will continue for several months. I also need 
to search these data, but only inside a very small time rage, maybe one hour. 
Is there any best practice for this kind of use case?

 

Thanks!

 

Alan

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com 
 .
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7b659b5f-7f50-483d-a2d5-de9c2e4b650c%40googlegroups.com
 

 .
For more options, visit https://groups.google.com/d/optout.

 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com 
 .
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y4GSxMc39SJf%3DCk5MwANMt7PRpnXt_oY3ZSyqNZoBpzw%40mail.gmail.com
 

 .
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/010601cfa155%24e55579b0%24b0006d10%24%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Get Perf Counters for ElasticSearch Nodes using JSON

2014-07-16 Thread Mark Walkom
You will want the cat API to start, then check out the cluster one next.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index.html

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 17 July 2014 10:14, Krishna  wrote:

> Hi
>
> I need to read the Performance counters (OS CPU%,JVM Memory,Disk Free
> Space,Iops)  of the elastic search nodes using JSON in my C#.NET
> application. To do that first I need the JSON query to get the counters
> like  the ones that appear under "Nodes" section in Marvel webpage.
>
> Does anyone know how to get that? I know it should be something like we
> get when we click "Inspect" element in marvel. (something like below) but I
> don't see any "Inspect" element for the "nodes" section.
>
> The following JSON query seems to be for Document section.
>
> GET .marvel-2014.07.16/_search?pretty
> {
>   "facets": {
> "0": {
>   "date_histogram": {
> "key_field": "@timestamp",
> "value_field": "total.search.query_total",
> "interval": "1m"
>   },
>   "global": true,
>   "facet_filter": {
> "fquery": {
>   "query": {
> "filtered": {
>   "query": {
> "query_string": {
>   "query": "_type:indices_stats"
> } 
> ...
>
>
> Appreciate if anyone can provide some pointers on this?
>
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8e44e3d3-a4c4-4f7b-a278-7cebd7f31845%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bR975QP3WmVLTPDHXGT-DZxrNyOtWGxM%3Dn%3DzPQUH3MzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Get Perf Counters for ElasticSearch Nodes using JSON

2014-07-16 Thread Krishna
Hi

I need to read the Performance counters (OS CPU%,JVM Memory,Disk Free 
Space,Iops)  of the elastic search nodes using JSON in my C#.NET 
application. To do that first I need the JSON query to get the counters 
like  the ones that appear under "Nodes" section in Marvel webpage.

Does anyone know how to get that? I know it should be something like we get 
when we click "Inspect" element in marvel. (something like below) but I 
don't see any "Inspect" element for the "nodes" section.

The following JSON query seems to be for Document section.

GET .marvel-2014.07.16/_search?pretty
{
  "facets": {
"0": {
  "date_histogram": {
"key_field": "@timestamp",
"value_field": "total.search.query_total",
"interval": "1m"
  },
  "global": true,
  "facet_filter": {
"fquery": {
  "query": {
"filtered": {
  "query": {
"query_string": {
  "query": "_type:indices_stats"
} 
...


Appreciate if anyone can provide some pointers on this?




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8e44e3d3-a4c4-4f7b-a278-7cebd7f31845%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: High memory usage on dedicated master nodes

2014-07-16 Thread Mark Walkom
Are they master only or are you sending queries through them as well?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 17 July 2014 03:14, David Smith  wrote:

> We have cluster with 22 data nodes and 3 dedicated master nodes running ES
> 1.1.0. Data nodes have 16 GB of memory (half given to JVM heap) and the
> dedicated masters have 4 GB (half for heap). Data nodes have consistent
> memory usage (about 50-60%) but we're observing that the master node is
> constantly growing and shrinking it's memory use (after GC) every 5
> minutes. See the bigdesk graph.
>
>
> 
>
>
> I'm curious as to why the master node is using up close to 1.5 GB of
> memory. I thought that maintaining cluster state shouldn't take up that
> much memory and master nodes should be able to work with much less memory.
> Anybody know why?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6a273add-a20b-41ab-9941-e6c9197389c0%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Zem%3D3TnHCoi7Gta7rCsK%3DDiuwurjwHRxNQ6HLjho9Z1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: term query no hits was Re: No parser for element [term]

2014-07-16 Thread Ivan Brusic
As predicted, your actual mapping does not match your perceived mapping.

Something is not matching up. Perhaps the mapping is for a different index
or type. Best way is to share your mapping and perhaps how you created your
index as indicated at http://www.elasticsearch.org/help

-- 
Ivan


On Wed, Jul 16, 2014 at 2:33 PM, Jack Park  wrote:

> Here are the mappings I dumped by adding that function to my client:
>
>
> {"topics":{"mappings":{"core":{"properties":{"crDt":{"type":"date","format":"dat
>
> eOptionalTime"},"crtr":{"type":"string"},"details":{"type":"string"},"inOf":{"ty
>
> pe":"string"},"isPrv":{"type":"string"},"lEdDt":{"type":"date","format":"dateOpt
>
> ionalTime"},"lIco":{"type":"string"},"label":{"type":"string"},"lox":{"type":"st
>
> ring"},"sIco":{"type":"string"},"sbOf":{"type":"string"},"srtDt":{"type":"long"}
> ,"trCl":{"type":"string"}}
>
> There are no fields for "index"  or "store" where
> "index":"not_analyzed" was used where appropriate.
>
> A snippet of the mappings.json is below.  I would like to know why ES
> ignored the values.
>
> Many thanks
> Jack
>
> {
> "properties": {
> "lox": {
> "index": "not_analyzed",
> "type": "string",
> "store": "yes"
> },
> "inOf": {
> "index": "not_analyzed",
> "type": "string",
> "store": "yes"
> },
>  "_ver": {
> "index": "not_analyzed",
> "type": "string",
> "store": "yes"
> },
> "srtDt": {
> "index": "not_analyzed",
>  "type": "long",
>  "store": "yes"
> },
> "tpC": {
> "index": "not_analyzed",
>  "type": "long",
>  "store": "yes"
> },
> "lists": {
> "properties": {
> "sbOf": {
> "index": "not_analyzed",
> "type": "string",
> "store": "yes"
> },
> "label": {
> "index": "analyzed",
> "type": "string",
> "store": "yes"
> },
> "details": {
> "index": "analyzed",
> "type": "string",
> "store": "yes"
> }
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAH6s0fwvc_yvfx91nsJiu3-etTHTza5DMkdjyWmM92u5R_VXSg%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCP8PPLbj46sDk%2Bad-3GmjW7HN5nS7WONcZ9T8yR9m%2BOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How many tcp connections should ES/logstash generate ?

2014-07-16 Thread Mark Walkom
If you are using daily indexes then don't even bother running the delete,
just drop the index when the next day rolls around.

"Resource temporarily unavailable" could indicate you may need to increase
the ulimit for the user, did you set this in /etc/default/elasticsearch?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 17 July 2014 03:07, Bastien Chong  wrote:

> Thanks for you input. I'm running ES as another user, i still had root
> access.
>
> I will refactor and create index per day, and each 30sec i'll simply
> delete index from yesterday. I'm hoping this greatly reduce the number of
> threads.
>
>
> On Wednesday, July 16, 2014 12:02:33 PM UTC-4, Jörg Prante wrote:
>
>> First, you should always run ES under another user with least-possible
>> privileges, so you can login, even if ES is running out of process space.
>> (There are more security related issues that everyone should care about, I
>> leave them out here)
>>
>> Second, it is not intended that ES runs so many processes. On the other
>> hand ES does not refuse to execute plenties of threads when retrying hard
>> to recover from network-related problems. Maybe you see what the threads
>> are doing by executing a "hot thread" command
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/cluster-nodes-hot-threads.html
>>
>> Third, you run every 30 secs a command to "delete by query" with a range
>> of many days. That does not seem to make sense. You should always take care
>> to complete such queries before continuing, they can take very long time (I
>> mean hours). They put a burden on your system. Set up daily indices, this
>> is much more efficient, deletions by day are a matter of seconds then.
>>
>> Jörg
>>
>>
>>
>>
>> On Wed, Jul 16, 2014 at 4:30 PM, Bastien Chong 
>> wrote:
>>
>>> http://serverfault.com/questions/412114/cannot-
>>> switch-ssh-to-specific-user-su-cannot-set-user-id-resource-temporaril
>>>
>>> Looks like I have the same issue, is it normal that ES spawns that much
>>> process, over 1000 ?
>>>
>>>
>>> On Wednesday, July 16, 2014 9:23:45 AM UTC-4, Bastien Chong wrote:

 I'm not sure how to find answer that, I use the default settings in ES.
 The cluster is composed of 2 read/write node, and a read-only node.
 There is 1 Logstash instance that simply output 2 type of data to ES.
 Nothing fancy.

 I need to delete documents older than a day, for this particular thing,
 I can't create a daily index. Is there a better way ?

 I'm using an EC2 m3.large instance, ES has 1.5GB of heap.

 It seems like I'm hitting an OS limit, I can't "su - elasticsearch" :

 su: /bin/bash: Resource temporarily unavailable

 Stopping elasticsearch fix this issue, so this is directly linked.

> -bash-4.1$ ulimit -a
> core file size  (blocks, -c) 0
> data seg size   (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size   (blocks, -f) unlimited
> pending signals (-i) 29841
> max locked memory   (kbytes, -l) unlimited
> max memory size (kbytes, -m) unlimited
> open files  (-n) 65536
> pipe size(512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority  (-r) 0
> stack size  (kbytes, -s) 8192
> cpu time   (seconds, -t) unlimited
> max user processes  (-u) 1024
> virtual memory  (kbytes, -v) unlimited
> file locks  (-x) unlimited
>




 On Tuesday, July 15, 2014 6:35:22 PM UTC-4, Mark Walkom wrote:
>
> It'd depend on your config I'd guess, in particular how many
> workers/threads you have and what ES output you are using in LS.
>
> Why are you cleaning an index like this anyway? It seems horribly
> inefficient.
> Basically the error is "OutOfMemoryError", which means you've run out
> of heap for the operation to complete. What are the specs for your node,
> how much heap does ES have?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 16 July 2014 00:43, Bastien Chong  wrote:
>
>> I have a basic setup with a logstash shipper, an indexer and an
>> elasticsearch cluster.
>> Elasticsearch listen on the standart 9200/9300 and logstash indexer
>> 9301/9302.
>>
>> When I do a netstat | wc -l for the ES process: 184 found
>> (sample)
>>
>>> tcp0  0 :::172.17.7.87:9300 :::
>>> 172.17.8.39:59573ESTABLISHED 23224/java
>>> tcp0  0 :::172.17.7.87:9300 :::
>>> 172.17.7.87:47609ESTABLISHED 23224/java
>>> tcp  

Re: term query no hits was Re: No parser for element [term]

2014-07-16 Thread Jack Park
Here are the mappings I dumped by adding that function to my client:

{"topics":{"mappings":{"core":{"properties":{"crDt":{"type":"date","format":"dat
eOptionalTime"},"crtr":{"type":"string"},"details":{"type":"string"},"inOf":{"ty
pe":"string"},"isPrv":{"type":"string"},"lEdDt":{"type":"date","format":"dateOpt
ionalTime"},"lIco":{"type":"string"},"label":{"type":"string"},"lox":{"type":"st
ring"},"sIco":{"type":"string"},"sbOf":{"type":"string"},"srtDt":{"type":"long"}
,"trCl":{"type":"string"}}

There are no fields for "index"  or "store" where
"index":"not_analyzed" was used where appropriate.

A snippet of the mappings.json is below.  I would like to know why ES
ignored the values.

Many thanks
Jack

{
"properties": {
"lox": {
"index": "not_analyzed",
"type": "string",
"store": "yes"
},
"inOf": {
"index": "not_analyzed",
"type": "string",
"store": "yes"
},
 "_ver": {
"index": "not_analyzed",
"type": "string",
"store": "yes"
},
"srtDt": {
"index": "not_analyzed",
 "type": "long",
 "store": "yes"
},
"tpC": {
"index": "not_analyzed",
 "type": "long",
 "store": "yes"
},
"lists": {
"properties": {
"sbOf": {
"index": "not_analyzed",
"type": "string",
"store": "yes"
},
"label": {
"index": "analyzed",
"type": "string",
"store": "yes"
},
"details": {
"index": "analyzed",
"type": "string",
"store": "yes"
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH6s0fwvc_yvfx91nsJiu3-etTHTza5DMkdjyWmM92u5R_VXSg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Python version for curator

2014-07-16 Thread Brian
No joy:

$ *pip install elasticsearch*
Requirement already satisfied (use --upgrade to upgrade): elasticsearch in 
/usr/lib/python2.6/site-packages
Cleaning up...

$ *curator --help*
Traceback (most recent call last):
  File "/usr/bin/curator", line 5, in 
from pkg_resources import load_entry_point
  File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 2655, in 

working_set.require(__requires__)
  File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 648, in 
require
needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 546, in 
resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: elasticsearch>=1.0.0,<2.0.0

$ *uname -a*
Linux elktest 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 
x86_64 x86_64 x86_64 GNU/Linux

Brian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/04eeb676-9b57-46ac-9b7b-fc1b45824d79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Updating Datatype in Elasticsearch

2014-07-16 Thread Brian
Within my configuration directory's templates/automap.json file is the 
following template. Elasticsearch uses this template whenever it generates 
a new logstash index each day:

{
  "automap" : {
"template" : "logstash-*",
"settings" : {
  "index.mapping.ignore_malformed" : true
},
"mappings" : {
  "_default_" : {
"numeric_detection" : true,
"_all" : { "enabled" : false },
"properties" : {
  "message" : { "type" : "string" },
  "host" : { "type" : "string" },
  "UUID" : {  "type" : "string", "index" : "not_analyzed" },
  "logdate" : {  "type" : "string", "index" : "no" }
}
  }
}
  }
}

Note:

1. How to ignore malformed data (for example, a numeric field that contains 
"no-data" every once in a while).

2. How to automatically detect numeric fields. Logstash makes every JSON 
value a string. Elasticsearch automatically detects dates, but must be 
explicitly configured to automatically detect numeric fields.

3. Listing fields that must be considered to be strings even if they 
contain numeric values, or must not be analyzed, or must not be indexed at 
all.

4. Disabling of the _all field: As long as your logstash configuration 
leaves the message field pretty much intact, disabling the _all field will 
reduce disk space, increase performance, while still keeping all search 
functionality. But then, don't forget to also update your Elasticsearch 
configuration to specify message as the default field.

Hope this helps!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e6e95468-3e21-4dc7-82eb-129a58c85852%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: term query no hits was Re: No parser for element [term]

2014-07-16 Thread Ivan Brusic
Do not rely on what your mapping template says and use the get mapping API
to find out what the mapping really is:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-get-mapping.html

Since many settings cannot be changed after an index is created, older
settings can be in effect.

-- 
Ivan


On Wed, Jul 16, 2014 at 1:42 PM, Jack Park  wrote:

> What I know at the moment is this: If I change "term" to "match", it
> gets the right answers.
> In some sense, that suggests that even though my mappings.json says
> that the field is not-analyzed, it's behaving as if it has been
> analyzed.
>
> On Wed, Jul 16, 2014 at 11:00 AM, Ivan Brusic  wrote:
> > I would verify that the field is in fact non_analyzed and that your data
> is
> > indexed in the way you think it is. Use the analyze API to analyze the
> term.
> > Make sure you use the last example, which is based on the field.
> >
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html
> >
> > A term query will do an exact match. If your field is in fact correct,
> post
> > an example mapping and document.
> >
> > Cheers,
> >
> > Ivan
> >
> >
> > On Wed, Jul 16, 2014 at 10:43 AM, Jack Park 
> > wrote:
> >>
> >> Thanks Ivan.
> >>
> >> That term query on the label field was, in fact on an analyzed field.
> >> What is of concern at the moment is that this query:
> >> {"query":{"term":{"inOf":"NodeBuilderType"}}}
> >> is  on an unanalyzed field, defined thus:
> >>
> >> "inOf": {
> >> "index": "not_analyzed",
> >> "type": "string",
> >> "store": "yes"
> >> },
> >>
> >> If all the ducks are lined up, it's not clear what the problem is for
> >> that query.
> >>
> >> On Wed, Jul 16, 2014 at 10:27 AM, Ivan Brusic  wrote:
> >> > By default, string fields are analyzed using the standard analyzer,
> >> > which
> >> > will tokenize and lowercase the input (I believe stop words are now
> NOT
> >> > removed). A term query does not analyze the query, so it only works on
> >> > non
> >> > analyzed fields (or fields that use a keyword tokenizer). A term query
> >> > for
> >> > "kimchy" works because it already is lowercased and only has one
> token.
> >> >
> >> > Try using a match query or set the field to be non analyzed. The
> choice
> >> > depends on your other use cases (do you require partial matching?).
> >> >
> >> > Cheers,
> >> >
> >> > Ivan
> >> >
> >> >
> >> > On Wed, Jul 16, 2014 at 10:21 AM, Jack Park  >
> >> > wrote:
> >> >>
> >> >> Thank you very much. I was in the process of drafting a message that
> I
> >> >> found that and made the query to look like these:
> >> >>
> >> >> {"query":{"term":{"label":"\"First instance node\""}}}
> >> >> {"query":{"term":{"inOf":"NodeBuilderType"}}}
> >> >>
> >> >> Neither returns any hits.
> >> >> In the same system, I did a text search with this query:
> >> >> {"query":{"multi_match":{"query":"topic
> >> >> map","fields":["details","label"]}}}
> >> >>
> >> >> and that worked perfectly.
> >> >> So, I have two open issues:
> >> >> 1- what's wrong with term query?
> >> >> 2- there were 145 hits on the text search; need to configure the
> query
> >> >> to do paging through those hits.
> >> >>
> >> >> Many thanks for this help
> >> >> Jack
> >> >>
> >> >> On Wed, Jul 16, 2014 at 9:48 AM, David Pilato 
> wrote:
> >> >> > You need to put it in a query.
> >> >> >
> >> >> > Have a look at
> >> >> >
> >> >> >
> >> >> >
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/search.html
> >> >> >
> >> >> > --
> >> >> > David ;-)
> >> >> > Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
> >> >> >
> >> >> > Le 16 juil. 2014 à 18:04, Jack Park  a
> >> >> > écrit :
> >> >> >
> >> >> > This exact query is not found in the list, so here goes. Just
> >> >> > upgraded
> >> >> > to
> >> >> > 1.2.2.
> >> >> >
> >> >> > The query documentation
> >> >> >
> >> >> >
> >> >> >
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
> >> >> > gives this example:
> >> >> >
> >> >> > {
> >> >> >"term" : { "user" : "kimchy" }
> >> >> > }
> >> >> >
> >> >> > So, my querydsl written for nodejs is this:
> >> >> >
> >> >> > {"term":{"inOf":"NodeBuilderType"}}
> >> >> >
> >> >> > I happen to know that documents satisfying that query exist, for
> >> >> > instance:
> >> >> >
> >> >> >
> >> >> >
> {"lox":"NodeBuilderSecondTopic","crtr":"SystemUser","sIco":"","lIco":"","crDt":"2014-07-16T08:45:21","srtDt":1405525521801,"lEdDt":"2014-07-16T08:45:21","isPrv":"false","label":["First
> >> >> > instance node"],"details":["Seems
> >> >> >
> >> >> >
> >> >> >
> likely"],"inOf":"NodeBuilderType","trCl",["NodeBuilderType","ASuperClass"],"sbOf":["ASuperClass"]}
> >> >> >
> >> >> > What I get back is an enormous stack trace, a portion of which from
> >> >> > the error log below.
> >> >> >
> >> >> > Am I missing something?
> >> >> >
> >> >> > Many thanks in advance.
> >> >> > Jack
> >> >> >
> >> >> > 

Lost template when simulating a disaster scenario

2014-07-16 Thread Andy Walker
Hi all!
First, I'd like to thank the creators of ES, the contributors, and the 
large community that surrounds it. I've gotten all sorts of great 
information from this list, the IRC channel, and all across the web. But, 
onto my scenario/question...

Long story (question below):
Yesterday, my task was to simulate a disaster scenario with my 
Elasticsearch cluster, just to see how easy it would be to pull through the 
mess. At the time, I was working with a single cluster, and three total 
nodes. Two nodes were active when I started, and one was a node that was 
completely fresh, didn't have the ES service running on it yet, and had 
never yet joined this (or any) cluster. All of the indexes I care about 
have shards with 1 replica. I started by ungracefully shutting down the 
first of the two nodes, and letting the second node take all of the 
activity. Everything was good so far. Obviously no data loss... onto step 
two. I brought the first node back up. I then wanted to simulate the 
scenario where the second node couldn't handle the extra work of 
rebalancing shards as well as handling most or all indexing/query traffic, 
so I ungracefully shut down elasticsearch on the second node as well. At 
this point, obviously, the cluster health was red, and a good portion of my 
data was missing. Feeling confident that I could still recover from this 
situation, I shut down elasticsearch on the first node, effectively 
shutting off the cluster completely across the board. Then I started ES on 
my third node (the fresh one), followed by starting ES on the first and 
second nodes as well. Much to my delight, all of the rebalancing took 
place, and all of my indexes and shards seemed to survive the storm just 
fine. I didn't get to learn any lessons about how to recover from that sort 
of disaster, because apparently it wasn't disastrous enough! HOWEVER... 
fast-forward to today... I have some daily indexes, and their mappings are 
set by a template that was created via the API (not via a config file), and 
today I noticed that the field mappings were wrong. I checked for the 
template via the API, and to my dismay, it was nowhere to be found.

Question:
Where/how are templates created via the rest API stored? Are they persisted 
to disk in any way, or are they just a bit of information that's shared 
amongst all of the nodes in a cluster, such that if all nodes were to be 
shut down at the same time, the template would be lost? How about in a 
split brain situation, which I introduced in my scenario, to an extent. If 
a new master comes in and doesn't have knowledge of any custom templates, 
but then another node -- which previously had those templates defined -- 
joins the cluster, does the fresh master win, therefore blowing away the 
template?

I'd be more than happy to set up some more test scenarios, and try to 
reproduce this, if it would help... but if it's a known issue, or if it 
just works that way by design, then I would rather not spend the time.

Thoughts?

Thanks!
Andy Walker

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4ff9593b-80a4-4eee-953d-08d287a7b683%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: term query no hits was Re: No parser for element [term]

2014-07-16 Thread Jack Park
What I know at the moment is this: If I change "term" to "match", it
gets the right answers.
In some sense, that suggests that even though my mappings.json says
that the field is not-analyzed, it's behaving as if it has been
analyzed.

On Wed, Jul 16, 2014 at 11:00 AM, Ivan Brusic  wrote:
> I would verify that the field is in fact non_analyzed and that your data is
> indexed in the way you think it is. Use the analyze API to analyze the term.
> Make sure you use the last example, which is based on the field.
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html
>
> A term query will do an exact match. If your field is in fact correct, post
> an example mapping and document.
>
> Cheers,
>
> Ivan
>
>
> On Wed, Jul 16, 2014 at 10:43 AM, Jack Park 
> wrote:
>>
>> Thanks Ivan.
>>
>> That term query on the label field was, in fact on an analyzed field.
>> What is of concern at the moment is that this query:
>> {"query":{"term":{"inOf":"NodeBuilderType"}}}
>> is  on an unanalyzed field, defined thus:
>>
>> "inOf": {
>> "index": "not_analyzed",
>> "type": "string",
>> "store": "yes"
>> },
>>
>> If all the ducks are lined up, it's not clear what the problem is for
>> that query.
>>
>> On Wed, Jul 16, 2014 at 10:27 AM, Ivan Brusic  wrote:
>> > By default, string fields are analyzed using the standard analyzer,
>> > which
>> > will tokenize and lowercase the input (I believe stop words are now NOT
>> > removed). A term query does not analyze the query, so it only works on
>> > non
>> > analyzed fields (or fields that use a keyword tokenizer). A term query
>> > for
>> > "kimchy" works because it already is lowercased and only has one token.
>> >
>> > Try using a match query or set the field to be non analyzed. The choice
>> > depends on your other use cases (do you require partial matching?).
>> >
>> > Cheers,
>> >
>> > Ivan
>> >
>> >
>> > On Wed, Jul 16, 2014 at 10:21 AM, Jack Park 
>> > wrote:
>> >>
>> >> Thank you very much. I was in the process of drafting a message that I
>> >> found that and made the query to look like these:
>> >>
>> >> {"query":{"term":{"label":"\"First instance node\""}}}
>> >> {"query":{"term":{"inOf":"NodeBuilderType"}}}
>> >>
>> >> Neither returns any hits.
>> >> In the same system, I did a text search with this query:
>> >> {"query":{"multi_match":{"query":"topic
>> >> map","fields":["details","label"]}}}
>> >>
>> >> and that worked perfectly.
>> >> So, I have two open issues:
>> >> 1- what's wrong with term query?
>> >> 2- there were 145 hits on the text search; need to configure the query
>> >> to do paging through those hits.
>> >>
>> >> Many thanks for this help
>> >> Jack
>> >>
>> >> On Wed, Jul 16, 2014 at 9:48 AM, David Pilato  wrote:
>> >> > You need to put it in a query.
>> >> >
>> >> > Have a look at
>> >> >
>> >> >
>> >> > http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/search.html
>> >> >
>> >> > --
>> >> > David ;-)
>> >> > Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>> >> >
>> >> > Le 16 juil. 2014 à 18:04, Jack Park  a
>> >> > écrit :
>> >> >
>> >> > This exact query is not found in the list, so here goes. Just
>> >> > upgraded
>> >> > to
>> >> > 1.2.2.
>> >> >
>> >> > The query documentation
>> >> >
>> >> >
>> >> > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
>> >> > gives this example:
>> >> >
>> >> > {
>> >> >"term" : { "user" : "kimchy" }
>> >> > }
>> >> >
>> >> > So, my querydsl written for nodejs is this:
>> >> >
>> >> > {"term":{"inOf":"NodeBuilderType"}}
>> >> >
>> >> > I happen to know that documents satisfying that query exist, for
>> >> > instance:
>> >> >
>> >> >
>> >> > {"lox":"NodeBuilderSecondTopic","crtr":"SystemUser","sIco":"","lIco":"","crDt":"2014-07-16T08:45:21","srtDt":1405525521801,"lEdDt":"2014-07-16T08:45:21","isPrv":"false","label":["First
>> >> > instance node"],"details":["Seems
>> >> >
>> >> >
>> >> > likely"],"inOf":"NodeBuilderType","trCl",["NodeBuilderType","ASuperClass"],"sbOf":["ASuperClass"]}
>> >> >
>> >> > What I get back is an enormous stack trace, a portion of which from
>> >> > the error log below.
>> >> >
>> >> > Am I missing something?
>> >> >
>> >> > Many thanks in advance.
>> >> > Jack
>> >> >
>> >> > [2014-07-16 08:53:20.332] [ERROR] TopicMap - DP.__listNodesByQuery
>> >> > {"term":{"inOf":"NodeBuilderType"}} | Error:
>> >> > {"error":"SearchPhaseExecutionException[Failed to execute phase
>> >> > [query], all shards failed; shardFailures
>> >> > {[cawnH8a8S32Bpl96txGwyw][topics][2]:
>> >> > SearchParseException[[topics][2]: from[-1],size[-1]: Parse Failure
>> >> > [Failed to parse source
>> >> > [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
>> >> > nested: SearchParseException[[topics][2]: from[-1],size[-1]: Parse
>> >> > Failure [No parser for element [term]]];
>> >> > }{[cawnH8a8S32Bpl96txGwyw][topics][3]:
>> >> > SearchParseException[[topics][3]: from[-1

Can't get Chef deployed ElasticSearch service to start

2014-07-16 Thread Ryan V
I posted this over on Github, but thought I might get more eyeballs here. 
 I'm deploying an ES cluster to EC2 via OpsWorks using the method described 
here: 
http://blogs.aws.amazon.com/application-management/post/Tx3MEVKS0A4G7R5/Deploying-Elasticsearch-with-OpsWorks

Out of the box, it all works perfectly, but my client requires ES 1.0+ so 
I've updated the ES version in the cookbook from 0.90.12 to 1.1.1 by using 
the commit 
here: https://github.com/elasticsearch/cookbook-elasticsearch/pull/213  

If I do the exact same thing, with that one change, I get 502 Bad Gateway 
errors from nginx.  If I connect via SSH to one of the instances, I notice 
elasticsearch is not even running and starting it manually doesn't seem to 
work.

[ec2-user@search1 run]$ sudo service elasticsearch status -v
elasticsearch not running
[ec2-user@search1 run]$ sudo service elasticsearch start
PID file found in /usr/local/var/run/search1_localdomain.pid, elasticsearch 
already running?
Starting elasticsearch...
[ec2-user@search1 run]$ sudo service elasticsearch status -v
elasticsearch not running

Should there be log files somewhere I can find?  Any ideas how to diagnose 
this?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7d07c010-abd4-4ce2-a070-3a29537ba8b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Marvel does not display any contents with ES1.2.1.

2014-07-16 Thread ajay b
Hi Boaz

After playing around with config only I managed to get it working but with
additional parameters set in the config, it does not work, not sure which
parameter breaks it at this time. It appears that once dummy node names are
assigned, it can take all the parameters.

Also with failed config, I see  value in health check
"active_primary_shards" : 0, vs when working "active_primary_shards" : 1,.
When does the active primary shard gets created ?

See below the two configs.

Thanks
Ajay

*Working config :*

cluster.name: elastic_test1
indices.memory.index_buffer_size: 50%
path.conf: /iscsi_disk1/etc/elasticsearch
path.data: /iscsi_disk1/data/elasticsearch
path.work: /iscsi_disk1/work/elasticsearch
path.logs: /iscsi_disk1/logs/elasticsearch
path.plugins: /usr/local/elasticsearch/plugins
bootstrap.mlockall: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ "elk01.dbplat.altus.bblabs.rim.net:9300",
"elk02.dbplat.altus.bblabs.rim.net:9300", "
elk03.dbplat.altus.bblabs.rim.net:9300" ]

*Failed Config:*

 action.auto_create_index: true  action.disable_delete_all_indices:
true  bootstrap.mlockall:
true  cloud.node.auto_attributes: true  cluster.name: elastic_test1
discovery.zen.minimum_master_nodes:
1  discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts:["elk01.dbplat.altus.bblabs.rim.net:9300","
elk02.dbplat.altus.bblabs.rim.net:9300","
elk03.dbplat.altus.bblabs.rim.net:9300"]  gateway.expected_nodes: 1  http.port:
9200  index.mapper.dynamic: true  indices.memory.index_buffer_size:
50%  marvel.agent.enabled:
enabled  marvel.agent.exporter.es.hosts: ["
elk01.dbplat.altus.bblabs.rim.net:9200"]  marvel.agent.shard_stats.enabled:
true  node.max_local_storage_nodes: 1  node.name:
elk01.dbplat.altus.bblabs.rim.net  path.conf:
/iscsi_disk1/etc/elasticsearch  path.data:
/iscsi_disk1/data/elasticsearch  path.logs:
/iscsi_disk1/logs/elasticsearch  path.plugins:
/usr/local/elasticsearch/bin/plugin  path.work:
/iscsi_disk1/work/elasticsearch



On Tue, Jul 15, 2014 at 5:19 PM, Boaz Leskes  wrote:

> Hi Ajay,
>
> If look at the following line:
>
> [2014-07-14 19:44:23,188][INFO ][plugins  ] [elk01] loaded
> [], sites []
>
>
> You can see that the node doesn't find the marvel plugin. How did you
> install ES? Did you install marvel with the same user? It may be that the
> user running ES has no access to the files? Do you see the marvel files
> at /usr/local/elasticsearch/bin/plugin ?
>
> PS I'm not sure which version of internet explorer you are using, but
> marvel is compatible with IE9 and up.
>
>
>
> On Mon, Jul 14, 2014 at 10:46 PM, ajay b  wrote:
>
>> Attached are logs with one config error on hostname fixed (elk001 ->
>> elk01)  and logs in DEBUG. Still no marvel display.
>> Thanks
>> Ajay
>>
>>
>> On Mon, Jul 14, 2014 at 2:12 PM, ajay b  wrote:
>>
>>> Hi Boaz
>>>
>>> Attached are the logs, and screenshot. new config (vm recreated) with
>>> fresh install config with marvel agent enabled . Using a single node setup
>>> to find the issue with marvel.
>>>
>>> Thanks
>>> Ajay
>>>
>>> Config:
>>>
>>> cluster.name: elastic_test1
>>> node.name: elk01
>>> node.max_local_storage_nodes: 1
>>> index.mapper.dynamic: true
>>> action.auto_create_index: true
>>> action.disable_delete_all_indices: true
>>> path.conf: /iscsi_disk1/etc/elasticsearch
>>> path.data: /iscsi_disk1/data/elasticsearch
>>> path.work: /iscsi_disk1/work/elasticsearch
>>> path.logs: /iscsi_disk1/logs/elasticsearch
>>> path.plugins: /usr/local/elasticsearch/bin/plugin
>>> bootstrap.mlockall: true
>>> http.port: 9200
>>> gateway.expected_nodes: 1
>>> discovery.zen.minimum_master_nodes: 1
>>> discovery.zen.ping.multicast.enabled: false
>>> cloud.node.auto_attributes: true
>>> discovery.zen.ping.unicast.hosts: [ "elk001:9300" ]
>>> indices.memory.index_buffer_size: 50%
>>> marvel.agent.enabled: enabled
>>>
>>>
>>> ES status :
>>> root@elk01:/iscsi_disk1/etc/elasticsearch# curl -XGET '
>>> http://localhost:9200/_cluster/health?pretty=true'
>>> {
>>>   "cluster_name" : "elastic_test1",
>>>   "status" : "yellow",
>>>   "timed_out" : false,
>>>   "number_of_nodes" : 1,
>>>   "number_of_data_nodes" : 1,
>>>   "active_primary_shards" : 5,
>>>   "active_shards" : 5,
>>>   "relocating_shards" : 0,
>>>   "initializing_shards" : 0,
>>>   "unassigned_shards" : 5
>>> }
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jul 14, 2014 at 6:38 AM, Boaz Leskes  wrote:
>>>
 Can you share a screen shot of what you see and also the logs of the
 node? I'm looking for the adress Es bound it self to and any error from
 marvel

 On Mon, Jul 14, 2014 at 2:26 AM, ajay.bh...@gmail.com <
 ajay.bh...@gmail.com> wrote:

> Tried multiple times with setting it to true or false both but result
> is same.
> Thanks
> Ajay
>
>  Sent from my BlackBerry 10 smartphone on the Rogers network.
>   *From: *Boaz Leskes
>  *Sent: *Sunday, July 13, 2014 02:09
>  *To: *elasticsearc

Re: multi_field and aggregations

2014-07-16 Thread Vinicius Carvalho
hum, sorry, noticed that the mapping was not set on that index. works fine 
now

On Wednesday, July 16, 2014 2:20:45 PM UTC-4, Vinicius Carvalho wrote:
>
> Hi there, it's been a while since I last used es. My last project was 
> based on 0.9.x branch. I'm new to the new fields and aggregations.
>
> I have a field mapped as a multi_field:
>
> "genre": {
>   "type": "string",
>   "store": false,
>   "index": "analyzed",
>   "analyzer": "short",
>   "fields" : {
>   "raw" : {"type" : "string", "index" : "not_analyzed"}
>   }
>
> The whole idea is that genre uses a custom analyzer that keeps stop words 
> (we need some of them on our genre), and I've decided to keep a raw so I 
> could execute a faceted search on counting genres
>
> But when I try to run:
>
> {
>   "size": 0,
>   "query": {
> "match_all": {}
>   },
>   "aggs": {
> "genre_count": {
>   "terms": {
> "size": 0,
> "field": "genre.raw"
>   }
> }
>   }
> }
>
> It fails with no results. What I wanted is to get something like :
>
> Action & Adventure : 100
>
> Adventure: 200
>
> Instead, if I remove the search, the terms are split in such way:
>
> Action : 100
>
> Adventure : 300
>
> What's the proper way to run the aggs query?
>
> Regards
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c5c6677e-285d-45dd-a33a-42c1776bd8ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Filtering different indexes with missing field

2014-07-16 Thread Tiago Ferreira
I got the answer, I just have to use the missing filter:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-missing-filter.html

Tiago Ferreira Lima

GitHub: fltiago
Skype: fltiago
LinkedIn: linkedin.com/in/fltiago

   -



On Wed, Jul 16, 2014 at 11:16 AM, Tiago Lima  wrote:

> Hi,
>
> I'm trying to filtering two different indexes with the same filter, but
> one of the indexes doesn't have the field that I'm trying to filter:
>
> "mappings":{
>  "user":{
>   "properties":{
>"inactive":{
> "type":"boolean",
> "index":"not_analyzed"
> }
>}
>   }
>  },
>  "course":{
>   "properties":{
>   }
>  }
> }
>
> Filtering:
> "filter":{
>   "and":[
>  {  },
>  {
> "term":{
>"inactive":false
> }
>  }
>   ]
>},
>
> But only users is retrieved, how can I apply this filter just for user
> index.
>
> I really appreciate any help.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/s5eBefaebP0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/9c5be39f-b1ab-4cf8-8b11-cca365cd228d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFRafxPNRt5nUwvz8NRJoi7%2B-vzi5v2kM8C7Gs9CTWXegbot4g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


multi_field and aggregations

2014-07-16 Thread Vinicius Carvalho
Hi there, it's been a while since I last used es. My last project was based 
on 0.9.x branch. I'm new to the new fields and aggregations.

I have a field mapped as a multi_field:

"genre": {
  "type": "string",
  "store": false,
  "index": "analyzed",
  "analyzer": "short",
  "fields" : {
  "raw" : {"type" : "string", "index" : "not_analyzed"}
  }

The whole idea is that genre uses a custom analyzer that keeps stop words 
(we need some of them on our genre), and I've decided to keep a raw so I 
could execute a faceted search on counting genres

But when I try to run:

{
  "size": 0,
  "query": {
"match_all": {}
  },
  "aggs": {
"genre_count": {
  "terms": {
"size": 0,
"field": "genre.raw"
  }
}
  }
}

It fails with no results. What I wanted is to get something like :

Action & Adventure : 100

Adventure: 200

Instead, if I remove the search, the terms are split in such way:

Action : 100

Adventure : 300

What's the proper way to run the aggs query?

Regards

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/368240bd-34ac-4a79-b085-17564e512954%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: term query no hits was Re: No parser for element [term]

2014-07-16 Thread Ivan Brusic
I would verify that the field is in fact non_analyzed and that your data is
indexed in the way you think it is. Use the analyze API to analyze the
term. Make sure you use the last example, which is based on the field.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html

A term query will do an exact match. If your field is in fact correct, post
an example mapping and document.

Cheers,

Ivan


On Wed, Jul 16, 2014 at 10:43 AM, Jack Park 
wrote:

> Thanks Ivan.
>
> That term query on the label field was, in fact on an analyzed field.
> What is of concern at the moment is that this query:
> {"query":{"term":{"inOf":"NodeBuilderType"}}}
> is  on an unanalyzed field, defined thus:
>
> "inOf": {
> "index": "not_analyzed",
> "type": "string",
> "store": "yes"
> },
>
> If all the ducks are lined up, it's not clear what the problem is for
> that query.
>
> On Wed, Jul 16, 2014 at 10:27 AM, Ivan Brusic  wrote:
> > By default, string fields are analyzed using the standard analyzer, which
> > will tokenize and lowercase the input (I believe stop words are now NOT
> > removed). A term query does not analyze the query, so it only works on
> non
> > analyzed fields (or fields that use a keyword tokenizer). A term query
> for
> > "kimchy" works because it already is lowercased and only has one token.
> >
> > Try using a match query or set the field to be non analyzed. The choice
> > depends on your other use cases (do you require partial matching?).
> >
> > Cheers,
> >
> > Ivan
> >
> >
> > On Wed, Jul 16, 2014 at 10:21 AM, Jack Park 
> > wrote:
> >>
> >> Thank you very much. I was in the process of drafting a message that I
> >> found that and made the query to look like these:
> >>
> >> {"query":{"term":{"label":"\"First instance node\""}}}
> >> {"query":{"term":{"inOf":"NodeBuilderType"}}}
> >>
> >> Neither returns any hits.
> >> In the same system, I did a text search with this query:
> >> {"query":{"multi_match":{"query":"topic
> >> map","fields":["details","label"]}}}
> >>
> >> and that worked perfectly.
> >> So, I have two open issues:
> >> 1- what's wrong with term query?
> >> 2- there were 145 hits on the text search; need to configure the query
> >> to do paging through those hits.
> >>
> >> Many thanks for this help
> >> Jack
> >>
> >> On Wed, Jul 16, 2014 at 9:48 AM, David Pilato  wrote:
> >> > You need to put it in a query.
> >> >
> >> > Have a look at
> >> >
> >> >
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/search.html
> >> >
> >> > --
> >> > David ;-)
> >> > Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
> >> >
> >> > Le 16 juil. 2014 à 18:04, Jack Park  a
> écrit :
> >> >
> >> > This exact query is not found in the list, so here goes. Just upgraded
> >> > to
> >> > 1.2.2.
> >> >
> >> > The query documentation
> >> >
> >> >
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
> >> > gives this example:
> >> >
> >> > {
> >> >"term" : { "user" : "kimchy" }
> >> > }
> >> >
> >> > So, my querydsl written for nodejs is this:
> >> >
> >> > {"term":{"inOf":"NodeBuilderType"}}
> >> >
> >> > I happen to know that documents satisfying that query exist, for
> >> > instance:
> >> >
> >> >
> {"lox":"NodeBuilderSecondTopic","crtr":"SystemUser","sIco":"","lIco":"","crDt":"2014-07-16T08:45:21","srtDt":1405525521801,"lEdDt":"2014-07-16T08:45:21","isPrv":"false","label":["First
> >> > instance node"],"details":["Seems
> >> >
> >> >
> likely"],"inOf":"NodeBuilderType","trCl",["NodeBuilderType","ASuperClass"],"sbOf":["ASuperClass"]}
> >> >
> >> > What I get back is an enormous stack trace, a portion of which from
> >> > the error log below.
> >> >
> >> > Am I missing something?
> >> >
> >> > Many thanks in advance.
> >> > Jack
> >> >
> >> > [2014-07-16 08:53:20.332] [ERROR] TopicMap - DP.__listNodesByQuery
> >> > {"term":{"inOf":"NodeBuilderType"}} | Error:
> >> > {"error":"SearchPhaseExecutionException[Failed to execute phase
> >> > [query], all shards failed; shardFailures
> >> > {[cawnH8a8S32Bpl96txGwyw][topics][2]:
> >> > SearchParseException[[topics][2]: from[-1],size[-1]: Parse Failure
> >> > [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> >> > nested: SearchParseException[[topics][2]: from[-1],size[-1]: Parse
> >> > Failure [No parser for element [term]]];
> >> > }{[cawnH8a8S32Bpl96txGwyw][topics][3]:
> >> > SearchParseException[[topics][3]: from[-1],size[-1]: Parse Failure
> >> > [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> >> > nested: SearchParseException[[topics][3]: from[-1],size[-1]: Parse
> >> > Failure [No parser for element [term]]];
> >> > }{[cawnH8a8S32Bpl96txGwyw][topics][0]:
> >> > SearchParseException[[topics][0]: from[-1],size[-1]: Parse Failure
> >> > [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> >> > nested: SearchParseException[[topics][0]: from[-1],size

Re: Direct buffer memory problem on master Discovery

2014-07-16 Thread Pedro Jerónimo
Ivan, I don't see any error on startup and, besides that, I see that it is 
set on the GET _nodes/process?pretty:

process: {
  refresh_interval_in_millis: 1000,
  id: 17132,
  max_file_descriptors: 4096,
*  mlockall: true*
}

I didn't make any other changes related to memory. According to ES docs 
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html)
 
I got under the impression that enabling this mlockall would solve memory 
issues. Apparently I was wrong :)

Any hints on this or the direct memory?


On Wednesday, July 16, 2014 6:55:41 PM UTC+2, Ivan Brusic wrote:
>
> Most users do not set the direct memory setting. mlockall is set, but does 
> the server allow it? You would see an error on startup if it didn't. Did 
> you change the vm swapiness on the server?
>
> -- 
> Ivan
>
>
> On Wed, Jul 16, 2014 at 2:40 AM, Pedro Jerónimo  > wrote:
>
>>
>> *Java: *java version "1.7.0_55" 
>> *ElasticSearch: *1.2.1
>>
>>
>> On Wednesday, July 16, 2014 10:55:10 AM UTC+2, Jörg Prante wrote:
>>
>>> What ES, and what Java version is this?
>>>
>>> Jörg
>>>
>>>
>>> On Tue, Jul 15, 2014 at 2:33 PM, Pedro Jerónimo  
>>> wrote:
>>>
 I have a cluster of 2 ES machines with a lot of indexing, not so much 
 searching. I'm using 2 EC2 machines with 30gb of RAM and I'm running ES on 
 each with 12gb heap (ES_HEAP_SIZE) and one of them (let's call it 
 logs1) is running logstash as well, with 2gb heap. The master node is 
 logs1 
 and the other instance is logs2. I start the cluster and every things 
 looks 
 fine, but after a while (1-3 days) I get the following error on logs1:

 [2014-07-15 12:26:39,867][WARN ][transport.netty ] [Keen Marlow] 
 exception caught on transport layer [[id: 0x16801a48, /XX.XX.XXX.XX:36314 
 => /XX.XXX.XX.XX:9300]], closing connection 
 java.lang.OutOfMemoryError: Direct buffer memory
 ...Stack Trace...

 And then the cluster is no longer connected and if I try to restart 
 logs2, I get the same error above for logs1 and this one for logs2:

 [2014-07-15 12:27:39,282][INFO ][discovery.ec2 ] [Betty Ross Banner] 
 failed to send join request to master [[Keen Marlow][9a7FIRpBSrKQcdcV_
 sjSTw][ip-XX-XX-XXX-XX][inet[/XX.XX.XXX.XX:9300]]{aws_availability_zone=us-west-2a,
  
 master=true}], reason 
 [org.elasticsearch.transport.RemoteTransportException: 
 [Keen Marlow][inet[/XX.XX.XXX.XX:9300]][discovery/zen/join]; 
 org.elasticsearch.transport.NodeDisconnectedException: [Betty Ross 
 Banner][inet[/XX.XXX.XX.XX:9300]][discovery/zen/join/validate] 
 disconnected]

 Is there any memory configuration I should tune up a bit? I'm kind of 
 new to ElasticSearch so I'd love some help! :).

 Thanks!

 Pedro
  
 -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/7d155e16-c8ff-40b0-ac0a-e3719e8296c3%
 40googlegroups.com 
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/3dc6803d-618c-4c2d-84cc-1fa9e3a7d6a3%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/29ea7a97-50af-4f75-beb6-f6cbb18d266d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: term query no hits was Re: No parser for element [term]

2014-07-16 Thread Jack Park
Thanks Ivan.

That term query on the label field was, in fact on an analyzed field.
What is of concern at the moment is that this query:
{"query":{"term":{"inOf":"NodeBuilderType"}}}
is  on an unanalyzed field, defined thus:

"inOf": {
"index": "not_analyzed",
"type": "string",
"store": "yes"
},

If all the ducks are lined up, it's not clear what the problem is for
that query.

On Wed, Jul 16, 2014 at 10:27 AM, Ivan Brusic  wrote:
> By default, string fields are analyzed using the standard analyzer, which
> will tokenize and lowercase the input (I believe stop words are now NOT
> removed). A term query does not analyze the query, so it only works on non
> analyzed fields (or fields that use a keyword tokenizer). A term query for
> "kimchy" works because it already is lowercased and only has one token.
>
> Try using a match query or set the field to be non analyzed. The choice
> depends on your other use cases (do you require partial matching?).
>
> Cheers,
>
> Ivan
>
>
> On Wed, Jul 16, 2014 at 10:21 AM, Jack Park 
> wrote:
>>
>> Thank you very much. I was in the process of drafting a message that I
>> found that and made the query to look like these:
>>
>> {"query":{"term":{"label":"\"First instance node\""}}}
>> {"query":{"term":{"inOf":"NodeBuilderType"}}}
>>
>> Neither returns any hits.
>> In the same system, I did a text search with this query:
>> {"query":{"multi_match":{"query":"topic
>> map","fields":["details","label"]}}}
>>
>> and that worked perfectly.
>> So, I have two open issues:
>> 1- what's wrong with term query?
>> 2- there were 145 hits on the text search; need to configure the query
>> to do paging through those hits.
>>
>> Many thanks for this help
>> Jack
>>
>> On Wed, Jul 16, 2014 at 9:48 AM, David Pilato  wrote:
>> > You need to put it in a query.
>> >
>> > Have a look at
>> >
>> > http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/search.html
>> >
>> > --
>> > David ;-)
>> > Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>> >
>> > Le 16 juil. 2014 à 18:04, Jack Park  a écrit :
>> >
>> > This exact query is not found in the list, so here goes. Just upgraded
>> > to
>> > 1.2.2.
>> >
>> > The query documentation
>> >
>> > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
>> > gives this example:
>> >
>> > {
>> >"term" : { "user" : "kimchy" }
>> > }
>> >
>> > So, my querydsl written for nodejs is this:
>> >
>> > {"term":{"inOf":"NodeBuilderType"}}
>> >
>> > I happen to know that documents satisfying that query exist, for
>> > instance:
>> >
>> > {"lox":"NodeBuilderSecondTopic","crtr":"SystemUser","sIco":"","lIco":"","crDt":"2014-07-16T08:45:21","srtDt":1405525521801,"lEdDt":"2014-07-16T08:45:21","isPrv":"false","label":["First
>> > instance node"],"details":["Seems
>> >
>> > likely"],"inOf":"NodeBuilderType","trCl",["NodeBuilderType","ASuperClass"],"sbOf":["ASuperClass"]}
>> >
>> > What I get back is an enormous stack trace, a portion of which from
>> > the error log below.
>> >
>> > Am I missing something?
>> >
>> > Many thanks in advance.
>> > Jack
>> >
>> > [2014-07-16 08:53:20.332] [ERROR] TopicMap - DP.__listNodesByQuery
>> > {"term":{"inOf":"NodeBuilderType"}} | Error:
>> > {"error":"SearchPhaseExecutionException[Failed to execute phase
>> > [query], all shards failed; shardFailures
>> > {[cawnH8a8S32Bpl96txGwyw][topics][2]:
>> > SearchParseException[[topics][2]: from[-1],size[-1]: Parse Failure
>> > [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
>> > nested: SearchParseException[[topics][2]: from[-1],size[-1]: Parse
>> > Failure [No parser for element [term]]];
>> > }{[cawnH8a8S32Bpl96txGwyw][topics][3]:
>> > SearchParseException[[topics][3]: from[-1],size[-1]: Parse Failure
>> > [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
>> > nested: SearchParseException[[topics][3]: from[-1],size[-1]: Parse
>> > Failure [No parser for element [term]]];
>> > }{[cawnH8a8S32Bpl96txGwyw][topics][0]:
>> > SearchParseException[[topics][0]: from[-1],size[-1]: Parse Failure
>> > [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
>> > nested: SearchParseException[[topics][0]: from[-1],size[-1]: Parse
>> > Failure [No parser for element [term]]];
>> > }{[cawnH8a8S32Bpl96txGwyw][topics][1]:
>> > SearchParseException[[topics][1]: from[-1],size[-1]: Parse Failure
>> > [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
>> > nested: SearchParseException[[topics][1]: from[-1],size[-1]: Parse
>> > Failure [No parser for element [term]]];
>> > }{[cawnH8a8S32Bpl96txGwyw][topics][4]:
>> > SearchParseException[[topics][4]: from[-1],size[-1]: Parse Failure
>> > [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
>> > nested: SearchParseException[[topics][4]: from[-1],size[-1]: Parse
>> > Failure [No parser for element [term]]]; }]","status":400}
>> >
>> > --
>> > You received this mess

Re: Aggregations with filter

2014-07-16 Thread Subacini B
Thank you Adrien, you are right it was mapping issue. I set analyzer to
keyword and it started working fine as expected.


On Wed, Jul 16, 2014 at 1:44 AM, Adrien Grand <
adrien.gr...@elasticsearch.com> wrote:

> Hi,
>
> A very likely cause for this issue is that by deleting and recreating your
> index, you lost your mappings. Quite likely, you want to set your COLA,
> COLB and COLC fields index=not_analyzed:
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#string
>
>
> On Tue, Jul 15, 2014 at 8:57 PM, esuser  wrote:
>
>> Hi All,
>>
>>
>> I am new to elastic search . I am trying below query in order to get
>> results based on some where condition with group by and avg.
>>
>> Query always returns zero hits, docs. If i remove terms , it is giving
>> results.
>>
>> This was working fine sometime back.After i deleted the index and created
>> a new one, it is not working. Any pointers  would help.
>>
>> If I use fields which are not string as filter terms, it gives me proper
>> results.
>>
>> {
>>   "aggs": {
>> "filtered": {
>>   "filter": {
>> "bool": {
>>   "must": [
>> {
>>   "term": {
>> "COLA": "NA"
>>   }
>> },
>> {
>>   "term": {
>> "COLB": "CUSTOMER"
>>   }
>> }  ,
>> {
>>   "range": {
>> "SCORE": {
>>   "from": 0,
>>   "to": 90
>> }
>>   }
>> }
>>   ]
>> }
>>  } ,
>> "aggs" : {
>> "cust" : {
>> "terms" : {
>> "field" : "COLC"
>> },
>> "aggs" : {
>> "avg_score" : { "avg" : { "field" : "SCORE" } }
>> }
>> }
>> }
>> }
>>  }
>> }
>>
>> Thanks.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/7b1cb3b5-7276-4ed8-a668-3cc7c285fbf0%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Adrien Grand
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/udEUL5bJAzo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7%2BvsQR%3D--C5w1b1_qE9vKTESR_LxpHy4ZdsmUOyRJ-mQ%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJfX5E27gVAJ52kPniF0Ly12vvPFLSW1CS9UzyGFe1hKLtU3ig%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Efficiency of search vs get

2014-07-16 Thread thale jacobs
>From the example:
client.prepareSearch(indexName).setRouting(routingStr).setQuery(
QueryBuilders.termQuery("_routing", routingStr)).execute().actionGet();



For clarification, can someone verify that the routing needs to be 
specified via setRouting(routingStr) as well as 
TermQuery(QueryBuilders.termQuery("_routing", routingStr)...?I am 
having a difficult time finding documentation on the java client api as it 
pertains to routing.  Thanks for the help.






On Friday, September 23, 2011 7:15:43 PM UTC-4, kimchy wrote:
>
> Replace setFilter with setQuery(QueryBuilders.termQuery("_routing", 
> routingStr), as the filter is mainly used to filter results fo the query 
> you execute (mainly used with faceting).
>
> On Fri, Sep 23, 2011 at 11:18 PM, Per Steffensen  > wrote:
>
>>  Shay Banon skrev: 
>>
>> Get is as fast as you can go to retrieve a single document, search 
>> against a single field (term query) that uses routing to direct the search 
>> request to a single shard will be almost as fast, but not the same. I don't 
>> have actual numbers to say how slower it will be. 
>>
>>  Regarding a combined index, there is no option to do that in 
>> elasticsearch. You can do a boolean query, with several must clauses 
>> including term query against a, b, and c. This will be slower (since now 
>> you are not searching on a single field, but 3).
>>
>>  On the other hand, the _routing field is automatically indexed (not 
>> analyzed). So, based on the same below, you can simply do a term query 
>> against _routing field with the routing value.
>>  
>> Thanks. Will the following code do the trick?
>>
>> client.prepareSearch(indexName).setRouting(routingStr).setFilter(new 
>> TermFilterBuilder("_routing", routingStr)).execute().actionGet();
>>
>>  
>>  Of course, you might get several documents with the search request, but 
>> I think you factored that in (a_b_c_1, and a_b_c_2).
>>
>> On Thu, Sep 15, 2011 at 10:06 PM, Per Steffensen > > wrote:
>>
>>> curl -X PUT "localhost:9200/mytest/abc/_mapping" -d '{
>>> "abc" : {
>>>  "_routing" : {
>>>  "required" : true
>>>  }
>>>  "properties" : {
>>>  "idx" : {"type" : "string", "index" : "not_analyzed"},
>>>  "a" : {"type" : "string"},
>>>  "b" : {"type" : "string"},
>>>  "c" : {"type" : "integer"},
>>>  "txt" : {"type" : "string", "null_value" : "na"}
>>>  }
>>> }
>>> }
>>>
>>> Lots of abc documents indexed into mytest index - a.o. this
>>> curl -XPUT 
>>> "localhost:9200/mytest/abc/1234_5678_90123?routing=1234_5678_90123" -d '{
>>> "sms" :
>>> {
>>>  "a" : "1234",
>>>  "b" : "5678",
>>>  "c" : 90123,
>>>  "txt" : "Hello World"
>>> }
>>> }
>>>
>>> Expect this "get" will be very efficient:
>>> curl -XGET "
>>> http://localhost:9200/mytest/abc/1234_5678_90123?routing=1234_5678_90123
>>> "
>>>
>>> I have cheated a little in the code above, when I indicate that I can 
>>> make an id consisting of the values of a, b and c. It is only almost true - 
>>> sometimes (but very very seldom) there will be documents with the same 
>>> values for a, b and c. Therefore I cannot make id's like this (will have to 
>>> make a_b_c_X id's og just GUID id's instead), and therefore I cannot "find" 
>>> the document(s) using the "get" above.
>>>
>>> Question: If I know that there will never be more than a few documents 
>>> with concrete values for a, b and c, can I create a "search" finding those 
>>> documents, a search that is just (or almost) as efficient (with respect to 
>>> searchtime and resources used) as the "get" above? Note that I am using 
>>> routing so I should at least be able to hit the right shard in such a 
>>> search.
>>>
>>> In a RDMS I would make an combined index of a, b and c and use the query 
>>> "select * from abc where a="1234" and b="5678" and c=90123" (the "search") 
>>> instead of "select * from abc where id="1234_5678_90123"" (the "get"), and 
>>> that would be just as efficient (if the RDMS uses the combined index, or 
>>> else I will force it by hinting).
>>>
>>> Thanks!
>>>
>>> Regards, Per Steffensen
>>>
>>>  
>>   
>>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7012fe03-4b32-4a86-8ca2-0cfdeb635760%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: term query no hits was Re: No parser for element [term]

2014-07-16 Thread Ivan Brusic
By default, string fields are analyzed using the standard analyzer, which
will tokenize and lowercase the input (I believe stop words are now NOT
removed). A term query does not analyze the query, so it only works on non
analyzed fields (or fields that use a keyword tokenizer). A term query for
"kimchy" works because it already is lowercased and only has one token.

Try using a match query or set the field to be non analyzed. The choice
depends on your other use cases (do you require partial matching?).

Cheers,

Ivan


On Wed, Jul 16, 2014 at 10:21 AM, Jack Park 
wrote:

> Thank you very much. I was in the process of drafting a message that I
> found that and made the query to look like these:
>
> {"query":{"term":{"label":"\"First instance node\""}}}
> {"query":{"term":{"inOf":"NodeBuilderType"}}}
>
> Neither returns any hits.
> In the same system, I did a text search with this query:
> {"query":{"multi_match":{"query":"topic
> map","fields":["details","label"]}}}
>
> and that worked perfectly.
> So, I have two open issues:
> 1- what's wrong with term query?
> 2- there were 145 hits on the text search; need to configure the query
> to do paging through those hits.
>
> Many thanks for this help
> Jack
>
> On Wed, Jul 16, 2014 at 9:48 AM, David Pilato  wrote:
> > You need to put it in a query.
> >
> > Have a look at
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/search.html
> >
> > --
> > David ;-)
> > Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
> >
> > Le 16 juil. 2014 à 18:04, Jack Park  a écrit :
> >
> > This exact query is not found in the list, so here goes. Just upgraded to
> > 1.2.2.
> >
> > The query documentation
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
> > gives this example:
> >
> > {
> >"term" : { "user" : "kimchy" }
> > }
> >
> > So, my querydsl written for nodejs is this:
> >
> > {"term":{"inOf":"NodeBuilderType"}}
> >
> > I happen to know that documents satisfying that query exist, for
> instance:
> >
> {"lox":"NodeBuilderSecondTopic","crtr":"SystemUser","sIco":"","lIco":"","crDt":"2014-07-16T08:45:21","srtDt":1405525521801,"lEdDt":"2014-07-16T08:45:21","isPrv":"false","label":["First
> > instance node"],"details":["Seems
> >
> likely"],"inOf":"NodeBuilderType","trCl",["NodeBuilderType","ASuperClass"],"sbOf":["ASuperClass"]}
> >
> > What I get back is an enormous stack trace, a portion of which from
> > the error log below.
> >
> > Am I missing something?
> >
> > Many thanks in advance.
> > Jack
> >
> > [2014-07-16 08:53:20.332] [ERROR] TopicMap - DP.__listNodesByQuery
> > {"term":{"inOf":"NodeBuilderType"}} | Error:
> > {"error":"SearchPhaseExecutionException[Failed to execute phase
> > [query], all shards failed; shardFailures
> > {[cawnH8a8S32Bpl96txGwyw][topics][2]:
> > SearchParseException[[topics][2]: from[-1],size[-1]: Parse Failure
> > [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> > nested: SearchParseException[[topics][2]: from[-1],size[-1]: Parse
> > Failure [No parser for element [term]]];
> > }{[cawnH8a8S32Bpl96txGwyw][topics][3]:
> > SearchParseException[[topics][3]: from[-1],size[-1]: Parse Failure
> > [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> > nested: SearchParseException[[topics][3]: from[-1],size[-1]: Parse
> > Failure [No parser for element [term]]];
> > }{[cawnH8a8S32Bpl96txGwyw][topics][0]:
> > SearchParseException[[topics][0]: from[-1],size[-1]: Parse Failure
> > [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> > nested: SearchParseException[[topics][0]: from[-1],size[-1]: Parse
> > Failure [No parser for element [term]]];
> > }{[cawnH8a8S32Bpl96txGwyw][topics][1]:
> > SearchParseException[[topics][1]: from[-1],size[-1]: Parse Failure
> > [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> > nested: SearchParseException[[topics][1]: from[-1],size[-1]: Parse
> > Failure [No parser for element [term]]];
> > }{[cawnH8a8S32Bpl96txGwyw][topics][4]:
> > SearchParseException[[topics][4]: from[-1],size[-1]: Parse Failure
> > [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> > nested: SearchParseException[[topics][4]: from[-1],size[-1]: Parse
> > Failure [No parser for element [term]]]; }]","status":400}
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "elasticsearch" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to elasticsearch+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/elasticsearch/CAH6s0fw%3D2T3VpUa3-Ok4je_W6s6HW-dcZcEJ6qztk-v0PxjuzA%40mail.gmail.com
> .
> > For more options, visit https://groups.google.com/d/optout.
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "elasticsearch" group.
> > To unsubscribe from this group and stop receiving emails

Re: Delete oldest X documents from index

2014-07-16 Thread David Smith
You could add datetime field to your document and order by that in 
ascending order and get the first X document id's. And then delete the 
documents by id.

While that's the solution to the question you asked, it might be better to 
re-think your problem so that you don't have to delete the oldest X 
documents. Have you considered whether your problem fits into the 
time-series index model? If it does, then you can just delete the oldest 
day (or whatever time period). Take a look at this 
video: http://www.elasticsearch.org/videos/big-data-search-and-analytics/ 
(about 21:15 minute mark).


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e215f898-562f-4fa1-9052-c0130ab6ee8c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


term query no hits was Re: No parser for element [term]

2014-07-16 Thread Jack Park
Thank you very much. I was in the process of drafting a message that I
found that and made the query to look like these:

{"query":{"term":{"label":"\"First instance node\""}}}
{"query":{"term":{"inOf":"NodeBuilderType"}}}

Neither returns any hits.
In the same system, I did a text search with this query:
{"query":{"multi_match":{"query":"topic map","fields":["details","label"]}}}

and that worked perfectly.
So, I have two open issues:
1- what's wrong with term query?
2- there were 145 hits on the text search; need to configure the query
to do paging through those hits.

Many thanks for this help
Jack

On Wed, Jul 16, 2014 at 9:48 AM, David Pilato  wrote:
> You need to put it in a query.
>
> Have a look at
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/search.html
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 16 juil. 2014 à 18:04, Jack Park  a écrit :
>
> This exact query is not found in the list, so here goes. Just upgraded to
> 1.2.2.
>
> The query documentation
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
> gives this example:
>
> {
>"term" : { "user" : "kimchy" }
> }
>
> So, my querydsl written for nodejs is this:
>
> {"term":{"inOf":"NodeBuilderType"}}
>
> I happen to know that documents satisfying that query exist, for instance:
> {"lox":"NodeBuilderSecondTopic","crtr":"SystemUser","sIco":"","lIco":"","crDt":"2014-07-16T08:45:21","srtDt":1405525521801,"lEdDt":"2014-07-16T08:45:21","isPrv":"false","label":["First
> instance node"],"details":["Seems
> likely"],"inOf":"NodeBuilderType","trCl",["NodeBuilderType","ASuperClass"],"sbOf":["ASuperClass"]}
>
> What I get back is an enormous stack trace, a portion of which from
> the error log below.
>
> Am I missing something?
>
> Many thanks in advance.
> Jack
>
> [2014-07-16 08:53:20.332] [ERROR] TopicMap - DP.__listNodesByQuery
> {"term":{"inOf":"NodeBuilderType"}} | Error:
> {"error":"SearchPhaseExecutionException[Failed to execute phase
> [query], all shards failed; shardFailures
> {[cawnH8a8S32Bpl96txGwyw][topics][2]:
> SearchParseException[[topics][2]: from[-1],size[-1]: Parse Failure
> [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> nested: SearchParseException[[topics][2]: from[-1],size[-1]: Parse
> Failure [No parser for element [term]]];
> }{[cawnH8a8S32Bpl96txGwyw][topics][3]:
> SearchParseException[[topics][3]: from[-1],size[-1]: Parse Failure
> [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> nested: SearchParseException[[topics][3]: from[-1],size[-1]: Parse
> Failure [No parser for element [term]]];
> }{[cawnH8a8S32Bpl96txGwyw][topics][0]:
> SearchParseException[[topics][0]: from[-1],size[-1]: Parse Failure
> [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> nested: SearchParseException[[topics][0]: from[-1],size[-1]: Parse
> Failure [No parser for element [term]]];
> }{[cawnH8a8S32Bpl96txGwyw][topics][1]:
> SearchParseException[[topics][1]: from[-1],size[-1]: Parse Failure
> [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> nested: SearchParseException[[topics][1]: from[-1],size[-1]: Parse
> Failure [No parser for element [term]]];
> }{[cawnH8a8S32Bpl96txGwyw][topics][4]:
> SearchParseException[[topics][4]: from[-1],size[-1]: Parse Failure
> [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> nested: SearchParseException[[topics][4]: from[-1],size[-1]: Parse
> Failure [No parser for element [term]]]; }]","status":400}
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAH6s0fw%3D2T3VpUa3-Ok4je_W6s6HW-dcZcEJ6qztk-v0PxjuzA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/977768D7-AFC8-40B6-8521-23C174341B38%40pilato.fr.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH6s0fyqrsFK5bbGW_rZMq%2BYa2UDR8yPPod0grgoHySGweBSyg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


High memory usage on dedicated master nodes

2014-07-16 Thread David Smith


We have cluster with 22 data nodes and 3 dedicated master nodes running ES 
1.1.0. Data nodes have 16 GB of memory (half given to JVM heap) and the 
dedicated masters have 4 GB (half for heap). Data nodes have consistent 
memory usage (about 50-60%) but we're observing that the master node is 
constantly growing and shrinking it's memory use (after GC) every 5 
minutes. See the bigdesk graph.




I'm curious as to why the master node is using up close to 1.5 GB of 
memory. I thought that maintaining cluster state shouldn't take up that 
much memory and master nodes should be able to work with much less memory. 
Anybody know why?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6a273add-a20b-41ab-9941-e6c9197389c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How many tcp connections should ES/logstash generate ?

2014-07-16 Thread Bastien Chong
Thanks for you input. I'm running ES as another user, i still had root 
access.

I will refactor and create index per day, and each 30sec i'll simply delete 
index from yesterday. I'm hoping this greatly reduce the number of threads.

On Wednesday, July 16, 2014 12:02:33 PM UTC-4, Jörg Prante wrote:
>
> First, you should always run ES under another user with least-possible 
> privileges, so you can login, even if ES is running out of process space. 
> (There are more security related issues that everyone should care about, I 
> leave them out here)
>
> Second, it is not intended that ES runs so many processes. On the other 
> hand ES does not refuse to execute plenties of threads when retrying hard 
> to recover from network-related problems. Maybe you see what the threads 
> are doing by executing a "hot thread" command 
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html
>
> Third, you run every 30 secs a command to "delete by query" with a range 
> of many days. That does not seem to make sense. You should always take care 
> to complete such queries before continuing, they can take very long time (I 
> mean hours). They put a burden on your system. Set up daily indices, this 
> is much more efficient, deletions by day are a matter of seconds then.
>
> Jörg
>
>
>
>
> On Wed, Jul 16, 2014 at 4:30 PM, Bastien Chong  > wrote:
>
>>
>> http://serverfault.com/questions/412114/cannot-switch-ssh-to-specific-user-su-cannot-set-user-id-resource-temporaril
>>
>> Looks like I have the same issue, is it normal that ES spawns that much 
>> process, over 1000 ?
>>
>>
>> On Wednesday, July 16, 2014 9:23:45 AM UTC-4, Bastien Chong wrote:
>>>
>>> I'm not sure how to find answer that, I use the default settings in ES. 
>>> The cluster is composed of 2 read/write node, and a read-only node.
>>> There is 1 Logstash instance that simply output 2 type of data to ES. 
>>> Nothing fancy.
>>>
>>> I need to delete documents older than a day, for this particular thing, 
>>> I can't create a daily index. Is there a better way ?
>>>
>>> I'm using an EC2 m3.large instance, ES has 1.5GB of heap.
>>>
>>> It seems like I'm hitting an OS limit, I can't "su - elasticsearch" : 
>>>
>>> su: /bin/bash: Resource temporarily unavailable
>>>
>>> Stopping elasticsearch fix this issue, so this is directly linked. 
>>>
 -bash-4.1$ ulimit -a
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 29841
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 65536
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 1024
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited

>>>
>>>
>>>
>>>
>>> On Tuesday, July 15, 2014 6:35:22 PM UTC-4, Mark Walkom wrote:

 It'd depend on your config I'd guess, in particular how many 
 workers/threads you have and what ES output you are using in LS.

 Why are you cleaning an index like this anyway? It seems horribly 
 inefficient.
 Basically the error is "OutOfMemoryError", which means you've run out 
 of heap for the operation to complete. What are the specs for your node, 
 how much heap does ES have?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 16 July 2014 00:43, Bastien Chong  wrote:

> I have a basic setup with a logstash shipper, an indexer and an 
> elasticsearch cluster.
> Elasticsearch listen on the standart 9200/9300 and logstash indexer 
> 9301/9302.
>
> When I do a netstat | wc -l for the ES process: 184 found
> (sample)
>
>> tcp0  0 :::172.17.7.87:9300 :::
>> 172.17.8.39:59573ESTABLISHED 23224/java
>> tcp0  0 :::172.17.7.87:9300 :::
>> 172.17.7.87:47609ESTABLISHED 23224/java
>> tcp0  0 :::172.17.7.87:53493:::
>> 172.17.7.87:9302 ESTABLISHED 23224/java
>> tcp0  0 :::172.17.7.87:9300 :::
>> 172.17.8.39:59564ESTABLISHED 23224/java
>> tcp0  0 :::172.17.7.87:9300 :::
>> 172.17.7.87:47657ESTABLISHED 23224/java
>>
>
> Same thing for the logstash indexer : 160 found
> (sample)
>
>> tcp0  0 :::172.17.7.87:50132:::
>> 172.17.8.39:9300 ESTABLISHED 1516/java
>> tcp0 

Re: Nested Query & Filter Query

2014-07-16 Thread Dark Globe
YES! 

Thank you so much David. I've been pulling my hair out trying different 
structures when it was that little gotcha all along grrr!

I wil take some comfort, however, in the fact that I had actually got the 
structure right this time.

Thank you so much David for all your help - thanks to you, I will be able 
to demonstrate ElasticSearch as a viable solution tomorrow.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/295c6fbe-1904-41c0-ab17-b24a47d4694d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Direct buffer memory problem on master Discovery

2014-07-16 Thread Ivan Brusic
Most users do not set the direct memory setting. mlockall is set, but does
the server allow it? You would see an error on startup if it didn't. Did
you change the vm swapiness on the server?

-- 
Ivan


On Wed, Jul 16, 2014 at 2:40 AM, Pedro Jerónimo 
wrote:

>
> *Java: *java version "1.7.0_55"
> *ElasticSearch: *1.2.1
>
>
> On Wednesday, July 16, 2014 10:55:10 AM UTC+2, Jörg Prante wrote:
>
>> What ES, and what Java version is this?
>>
>> Jörg
>>
>>
>> On Tue, Jul 15, 2014 at 2:33 PM, Pedro Jerónimo 
>> wrote:
>>
>>> I have a cluster of 2 ES machines with a lot of indexing, not so much
>>> searching. I'm using 2 EC2 machines with 30gb of RAM and I'm running ES on
>>> each with 12gb heap (ES_HEAP_SIZE) and one of them (let's call it
>>> logs1) is running logstash as well, with 2gb heap. The master node is logs1
>>> and the other instance is logs2. I start the cluster and every things looks
>>> fine, but after a while (1-3 days) I get the following error on logs1:
>>>
>>> [2014-07-15 12:26:39,867][WARN ][transport.netty ] [Keen Marlow]
>>> exception caught on transport layer [[id: 0x16801a48, /XX.XX.XXX.XX:36314
>>> => /XX.XXX.XX.XX:9300]], closing connection
>>> java.lang.OutOfMemoryError: Direct buffer memory
>>> ...Stack Trace...
>>>
>>> And then the cluster is no longer connected and if I try to restart
>>> logs2, I get the same error above for logs1 and this one for logs2:
>>>
>>> [2014-07-15 12:27:39,282][INFO ][discovery.ec2 ] [Betty Ross Banner]
>>> failed to send join request to master [[Keen Marlow][9a7FIRpBSrKQcdcV_
>>> sjSTw][ip-XX-XX-XXX-XX][inet[/XX.XX.XXX.XX:9300]]{aws_availability_zone=us-west-2a,
>>> master=true}], reason [org.elasticsearch.transport.RemoteTransportException:
>>> [Keen Marlow][inet[/XX.XX.XXX.XX:9300]][discovery/zen/join];
>>> org.elasticsearch.transport.NodeDisconnectedException: [Betty Ross
>>> Banner][inet[/XX.XXX.XX.XX:9300]][discovery/zen/join/validate]
>>> disconnected]
>>>
>>> Is there any memory configuration I should tune up a bit? I'm kind of
>>> new to ElasticSearch so I'd love some help! :).
>>>
>>> Thanks!
>>>
>>> Pedro
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/7d155e16-c8ff-40b0-ac0a-e3719e8296c3%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/3dc6803d-618c-4c2d-84cc-1fa9e3a7d6a3%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD6Zp%2BHMvFTaUQ3L5FXV7sKaR4MYSVOVQa%3DZzfsBrB7Lg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Nested Query & Filter Query

2014-07-16 Thread David Pilato
Floyd needs to be lowercased if you use a Term Filter or query

As it has been indexed in lowercase.


--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 16 juil. 2014 à 18:09, Dark Globe  a écrit :
> 
> Thanks David, I did read that section a couple of days back and again just 
> now but I still don't understand why the final search in my gist does not 
> return one result.
> 
> If I remove { "term": { "title" : "Floyd" }} and run the script, the final 
> search returns one result, Pink Floyd.
> 
> But I think it should also return that same result with the term filter { 
> "term": { "title" : "Floyd" }} added. But it doesn't.
> 
> I still do not understand why this is or what I am doing wrong :/
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/20ead9ad-bead-497f-b0f1-6786e9fc8ab0%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2EB2635B-5B67-4B9C-8650-31AF29B43DEC%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Updating Datatype in Elasticsearch

2014-07-16 Thread Barry Williams
Hello All,
I'm a n00b, and I'm having trouble changing a field's datatype in 
elasticsearch - so that kibana can use it.

I read in a CSV with logstash. Here is a sample of that CSV:

DateTime,Session,Event,Data/Duration
2014-05-12T21:51:44,1399945863,Pressure,7.00



Here is my logstash config:

input {
  file {
path => 
"/elk/Samples/CPAP_07_14_2014/CSV/SleepSheep_07_14_2014_no_header.csv"
start_position => beginning
  }
}


filter {
  csv {
columns => ["DateTime","Session","Event","Data/Duration"]
  }
}


output {
  elasticsearch {
host => localhost
  }
  stdout { codec => rubydebug }
}



Whenever the data reaches elasticsearch, the mapping shows the 
"Date/Duration" field as a string, not a float, preventing kibana from 
using it for graphing.  I tried to use PUT on elasticsearch to overwrite 
the mapping, but it wont let me.


Where should I configure this datatype? In the CSV filter, in the output, 
in elasticsearch?

Thanks,
Barry

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5fac5f75-bcd3-4900-8d0a-94c930e7935c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: No parser for element [term]

2014-07-16 Thread David Pilato
You need to put it in a query.

Have a look at 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/search.html

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 16 juil. 2014 à 18:04, Jack Park  a écrit :
> 
> This exact query is not found in the list, so here goes. Just upgraded to 
> 1.2.2.
> 
> The query documentation
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
> gives this example:
> 
> {
>"term" : { "user" : "kimchy" }
> }
> 
> So, my querydsl written for nodejs is this:
> 
> {"term":{"inOf":"NodeBuilderType"}}
> 
> I happen to know that documents satisfying that query exist, for instance:
> {"lox":"NodeBuilderSecondTopic","crtr":"SystemUser","sIco":"","lIco":"","crDt":"2014-07-16T08:45:21","srtDt":1405525521801,"lEdDt":"2014-07-16T08:45:21","isPrv":"false","label":["First
> instance node"],"details":["Seems
> likely"],"inOf":"NodeBuilderType","trCl",["NodeBuilderType","ASuperClass"],"sbOf":["ASuperClass"]}
> 
> What I get back is an enormous stack trace, a portion of which from
> the error log below.
> 
> Am I missing something?
> 
> Many thanks in advance.
> Jack
> 
> [2014-07-16 08:53:20.332] [ERROR] TopicMap - DP.__listNodesByQuery
> {"term":{"inOf":"NodeBuilderType"}} | Error:
> {"error":"SearchPhaseExecutionException[Failed to execute phase
> [query], all shards failed; shardFailures
> {[cawnH8a8S32Bpl96txGwyw][topics][2]:
> SearchParseException[[topics][2]: from[-1],size[-1]: Parse Failure
> [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> nested: SearchParseException[[topics][2]: from[-1],size[-1]: Parse
> Failure [No parser for element [term]]];
> }{[cawnH8a8S32Bpl96txGwyw][topics][3]:
> SearchParseException[[topics][3]: from[-1],size[-1]: Parse Failure
> [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> nested: SearchParseException[[topics][3]: from[-1],size[-1]: Parse
> Failure [No parser for element [term]]];
> }{[cawnH8a8S32Bpl96txGwyw][topics][0]:
> SearchParseException[[topics][0]: from[-1],size[-1]: Parse Failure
> [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> nested: SearchParseException[[topics][0]: from[-1],size[-1]: Parse
> Failure [No parser for element [term]]];
> }{[cawnH8a8S32Bpl96txGwyw][topics][1]:
> SearchParseException[[topics][1]: from[-1],size[-1]: Parse Failure
> [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> nested: SearchParseException[[topics][1]: from[-1],size[-1]: Parse
> Failure [No parser for element [term]]];
> }{[cawnH8a8S32Bpl96txGwyw][topics][4]:
> SearchParseException[[topics][4]: from[-1],size[-1]: Parse Failure
> [Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
> nested: SearchParseException[[topics][4]: from[-1],size[-1]: Parse
> Failure [No parser for element [term]]]; }]","status":400}
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAH6s0fw%3D2T3VpUa3-Ok4je_W6s6HW-dcZcEJ6qztk-v0PxjuzA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/977768D7-AFC8-40B6-8521-23C174341B38%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Delete oldest X documents from index

2014-07-16 Thread Ophir Michaeli
Hi all,

I want to delete the oldest X docs from my elasticsearch index.
How do I do that?

Thanks, Ophir

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/04e6e4b0-2498-4be8-893d-01059e6f1a69%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Nested Query & Filter Query

2014-07-16 Thread Dark Globe
Thanks David, I did read that section a couple of days back and again just 
now but I still don't understand why the final search in my gist does not 
return one result.

If I remove { "term": { "title" : "Floyd" }} and run the script, the final 
search returns one result, Pink Floyd.

But I think it should also return that same result with the term filter { 
"term": 
{ "title" : "Floyd" }} added. But it doesn't.

I still do not understand why this is or what I am doing wrong :/

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20ead9ad-bead-497f-b0f1-6786e9fc8ab0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


No parser for element [term]

2014-07-16 Thread Jack Park
This exact query is not found in the list, so here goes. Just upgraded to 1.2.2.

The query documentation
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
 gives this example:

{
"term" : { "user" : "kimchy" }
}

So, my querydsl written for nodejs is this:

{"term":{"inOf":"NodeBuilderType"}}

I happen to know that documents satisfying that query exist, for instance:
{"lox":"NodeBuilderSecondTopic","crtr":"SystemUser","sIco":"","lIco":"","crDt":"2014-07-16T08:45:21","srtDt":1405525521801,"lEdDt":"2014-07-16T08:45:21","isPrv":"false","label":["First
instance node"],"details":["Seems
likely"],"inOf":"NodeBuilderType","trCl",["NodeBuilderType","ASuperClass"],"sbOf":["ASuperClass"]}

What I get back is an enormous stack trace, a portion of which from
the error log below.

Am I missing something?

Many thanks in advance.
Jack

[2014-07-16 08:53:20.332] [ERROR] TopicMap - DP.__listNodesByQuery
{"term":{"inOf":"NodeBuilderType"}} | Error:
{"error":"SearchPhaseExecutionException[Failed to execute phase
[query], all shards failed; shardFailures
{[cawnH8a8S32Bpl96txGwyw][topics][2]:
SearchParseException[[topics][2]: from[-1],size[-1]: Parse Failure
[Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
nested: SearchParseException[[topics][2]: from[-1],size[-1]: Parse
Failure [No parser for element [term]]];
}{[cawnH8a8S32Bpl96txGwyw][topics][3]:
SearchParseException[[topics][3]: from[-1],size[-1]: Parse Failure
[Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
nested: SearchParseException[[topics][3]: from[-1],size[-1]: Parse
Failure [No parser for element [term]]];
}{[cawnH8a8S32Bpl96txGwyw][topics][0]:
SearchParseException[[topics][0]: from[-1],size[-1]: Parse Failure
[Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
nested: SearchParseException[[topics][0]: from[-1],size[-1]: Parse
Failure [No parser for element [term]]];
}{[cawnH8a8S32Bpl96txGwyw][topics][1]:
SearchParseException[[topics][1]: from[-1],size[-1]: Parse Failure
[Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
nested: SearchParseException[[topics][1]: from[-1],size[-1]: Parse
Failure [No parser for element [term]]];
}{[cawnH8a8S32Bpl96txGwyw][topics][4]:
SearchParseException[[topics][4]: from[-1],size[-1]: Parse Failure
[Failed to parse source [{\"term\":{\"inOf\":\"NodeBuilderType\"}}]]];
nested: SearchParseException[[topics][4]: from[-1],size[-1]: Parse
Failure [No parser for element [term]]]; }]","status":400}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH6s0fw%3D2T3VpUa3-Ok4je_W6s6HW-dcZcEJ6qztk-v0PxjuzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How many tcp connections should ES/logstash generate ?

2014-07-16 Thread joergpra...@gmail.com
First, you should always run ES under another user with least-possible
privileges, so you can login, even if ES is running out of process space.
(There are more security related issues that everyone should care about, I
leave them out here)

Second, it is not intended that ES runs so many processes. On the other
hand ES does not refuse to execute plenties of threads when retrying hard
to recover from network-related problems. Maybe you see what the threads
are doing by executing a "hot thread" command

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html

Third, you run every 30 secs a command to "delete by query" with a range of
many days. That does not seem to make sense. You should always take care to
complete such queries before continuing, they can take very long time (I
mean hours). They put a burden on your system. Set up daily indices, this
is much more efficient, deletions by day are a matter of seconds then.

Jörg




On Wed, Jul 16, 2014 at 4:30 PM, Bastien Chong  wrote:

>
> http://serverfault.com/questions/412114/cannot-switch-ssh-to-specific-user-su-cannot-set-user-id-resource-temporaril
>
> Looks like I have the same issue, is it normal that ES spawns that much
> process, over 1000 ?
>
>
> On Wednesday, July 16, 2014 9:23:45 AM UTC-4, Bastien Chong wrote:
>>
>> I'm not sure how to find answer that, I use the default settings in ES.
>> The cluster is composed of 2 read/write node, and a read-only node.
>> There is 1 Logstash instance that simply output 2 type of data to ES.
>> Nothing fancy.
>>
>> I need to delete documents older than a day, for this particular thing, I
>> can't create a daily index. Is there a better way ?
>>
>> I'm using an EC2 m3.large instance, ES has 1.5GB of heap.
>>
>> It seems like I'm hitting an OS limit, I can't "su - elasticsearch" :
>>
>> su: /bin/bash: Resource temporarily unavailable
>>
>> Stopping elasticsearch fix this issue, so this is directly linked.
>>
>>> -bash-4.1$ ulimit -a
>>> core file size  (blocks, -c) 0
>>> data seg size   (kbytes, -d) unlimited
>>> scheduling priority (-e) 0
>>> file size   (blocks, -f) unlimited
>>> pending signals (-i) 29841
>>> max locked memory   (kbytes, -l) unlimited
>>> max memory size (kbytes, -m) unlimited
>>> open files  (-n) 65536
>>> pipe size(512 bytes, -p) 8
>>> POSIX message queues (bytes, -q) 819200
>>> real-time priority  (-r) 0
>>> stack size  (kbytes, -s) 8192
>>> cpu time   (seconds, -t) unlimited
>>> max user processes  (-u) 1024
>>> virtual memory  (kbytes, -v) unlimited
>>> file locks  (-x) unlimited
>>>
>>
>>
>>
>>
>> On Tuesday, July 15, 2014 6:35:22 PM UTC-4, Mark Walkom wrote:
>>>
>>> It'd depend on your config I'd guess, in particular how many
>>> workers/threads you have and what ES output you are using in LS.
>>>
>>> Why are you cleaning an index like this anyway? It seems horribly
>>> inefficient.
>>> Basically the error is "OutOfMemoryError", which means you've run out
>>> of heap for the operation to complete. What are the specs for your node,
>>> how much heap does ES have?
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 16 July 2014 00:43, Bastien Chong  wrote:
>>>
 I have a basic setup with a logstash shipper, an indexer and an
 elasticsearch cluster.
 Elasticsearch listen on the standart 9200/9300 and logstash indexer
 9301/9302.

 When I do a netstat | wc -l for the ES process: 184 found
 (sample)

> tcp0  0 :::172.17.7.87:9300 :::
> 172.17.8.39:59573ESTABLISHED 23224/java
> tcp0  0 :::172.17.7.87:9300 :::
> 172.17.7.87:47609ESTABLISHED 23224/java
> tcp0  0 :::172.17.7.87:53493:::
> 172.17.7.87:9302 ESTABLISHED 23224/java
> tcp0  0 :::172.17.7.87:9300 :::
> 172.17.8.39:59564ESTABLISHED 23224/java
> tcp0  0 :::172.17.7.87:9300 :::
> 172.17.7.87:47657ESTABLISHED 23224/java
>

 Same thing for the logstash indexer : 160 found
 (sample)

> tcp0  0 :::172.17.7.87:50132:::
> 172.17.8.39:9300 ESTABLISHED 1516/java
> tcp0  0 :::172.17.7.87:9301 :::
> 172.17.7.87:60153ESTABLISHED 1516/java
> tcp0  0 :::172.17.7.87:9301 :::
> 172.17.7.87:60145ESTABLISHED 1516/java
> tcp0  0 :::172.17.7.87:50129:::
> 172.17.8.39:9300 ESTABLISHED 1516/java
> tcp0  0 :::172.17.7.87:9302 :::
> 172.17.7.87:53501ESTABLISHED 1516/java
>

 Also, not sure if related, wh

Re: Nested Query & Filter Query

2014-07-16 Thread David Pilato
Match only exists as a query.
Term exists as a filter and as a query.

Think about queries as full text search. Filters are like SQL. They "only" 
limit the dataset your are going to query on.

You should read 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/search.html

HTH


--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 16 juil. 2014 à 16:07, Dark Globe  a écrit :
> 
> Just found that replacing "match" for "term" in my second filter condition 
> runs without error...but doesn't actually work so no results are returned 
> when 1 should be :/
> 
> I'm sure I'm getting closer and I'm sure I shouldn't be finding this such a 
> struggle!
> 
> Any pointers are welcome and appreciated.
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/f706abeb-461f-49e7-8f15-5cf6bf86cc0f%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/D566800F-B75E-4F9C-BD38-C400FC11D81A%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: No efect refresh_interval

2014-07-16 Thread Michael McCandless
Which ES version are you using?  You should use the latest (soon to be
1.3): there have been a number of bulk-indexing improvements recently.

Are you using the bulk API with multiple/async client threads?  Are you
saturating either CPU or IO in your cluster (so that the test is really a
full cluster capacity test)?

Also, the relationship between refresh_interval and indexing performance is
tricky: it turns out, -1 is often a poor choice, because it means your bulk
indexing threads are sometimes tied up flushing segments when with
refreshing enabled, it's a separate thread that does that.  So a refresh of
5s is maybe a good choice.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jul 16, 2014 at 6:51 AM, Marek Dabrowski 
wrote:

> Hello
>
> My configuration is:
> 6 nodes Elasticsearch cluster
> OS: Centos 6.5
> JVM: 1.7.0_25
>
> Cluster is working fine. I can indexing data, query, etc. Now I'm doing
> test on package about ~50mln doc (~13GB). I would like take better
> performance during indexing data. To take this target I has been changed
> parameter refresh_interval. I did test for 1s, -1 and 600s. Time for
> indexing data is that same. I checked configuration (_settings) for index
> and value for refresh_interval is ok (has proper value), eg:
>
> {
>   "smt_20140501_10_20g_norefresh" : {
> "settings" : {
>   "index" : {
> "uuid" : "q3imiZGQTDasQUuMWS8oiw",
> "number_of_replicas" : "1",
> "number_of_shards" : "6",
> "refresh_interval" : "600s",
> "version" : {
>   "created" : "1020199"
> }
>   }
> }
>   }
> }
>
>
>
> Create index, setting refresh_interval and load is done on that same
> cluster node. Before test index is deleted and created again before start
> new test with new value of refresh_interval. All cluster nodes logs
> information that parameter has been changed, eg:
> [2014-07-16 11:24:09,813][INFO ][index.shard.service  ] [h6]
> [smt_20140501_10_20g_norefresh][1] updating refresh_interval from [1s]
> to [-1]
> or
> [2014-07-16 11:32:32,928][INFO ][index.shard.service  ] [h6]
> [smt_20140501_10_20g_norefresh][1] updating refresh_interval from [1s]
> to [10m]
>
> After start test new data are available immediately and indexing time that
> same in 3 cases. I don't know where is failure. Somebody know what is going
> on?
>
> Regards
> Marek
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f7565c36-98c7-4e3e-8132-796f9edfb3fa%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRezWeZFQMSMVXj7ELW0xGSu3sPRxfXqcuF4bmtrLVBjYg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Indexing and Searching Multimedia Files

2014-07-16 Thread Alexandre Rafalovitch
Did you look at Attachment type plugin?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-attachment-type.html

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov

On Mon, Jul 14, 2014 at 5:30 PM, George Viju  wrote:
> Hi,
> In elasticsearch the following datatypes of data can be indexed string,
> integer/long, float/double, boolean and null  which is referred in the
> following link
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html.
> Is there any method/approach can be used to index or Search the Multimedia
> files like Audio, Video, Pdf , image etc. in elasticsearch?
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4f13d7cf-7da4-46dc-8fb2-83b47fed8fb6%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEFAe-E%3DUbHQ4fUjvAouPiD0FVo-EABbqY9ZxvThzyzFbvBdxw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How many tcp connections should ES/logstash generate ?

2014-07-16 Thread Bastien Chong
http://serverfault.com/questions/412114/cannot-switch-ssh-to-specific-user-su-cannot-set-user-id-resource-temporaril

Looks like I have the same issue, is it normal that ES spawns that much 
process, over 1000 ?

On Wednesday, July 16, 2014 9:23:45 AM UTC-4, Bastien Chong wrote:
>
> I'm not sure how to find answer that, I use the default settings in ES. 
> The cluster is composed of 2 read/write node, and a read-only node.
> There is 1 Logstash instance that simply output 2 type of data to ES. 
> Nothing fancy.
>
> I need to delete documents older than a day, for this particular thing, I 
> can't create a daily index. Is there a better way ?
>
> I'm using an EC2 m3.large instance, ES has 1.5GB of heap.
>
> It seems like I'm hitting an OS limit, I can't "su - elasticsearch" : 
>
> su: /bin/bash: Resource temporarily unavailable
>
> Stopping elasticsearch fix this issue, so this is directly linked. 
>
>> -bash-4.1$ ulimit -a
>> core file size  (blocks, -c) 0
>> data seg size   (kbytes, -d) unlimited
>> scheduling priority (-e) 0
>> file size   (blocks, -f) unlimited
>> pending signals (-i) 29841
>> max locked memory   (kbytes, -l) unlimited
>> max memory size (kbytes, -m) unlimited
>> open files  (-n) 65536
>> pipe size(512 bytes, -p) 8
>> POSIX message queues (bytes, -q) 819200
>> real-time priority  (-r) 0
>> stack size  (kbytes, -s) 8192
>> cpu time   (seconds, -t) unlimited
>> max user processes  (-u) 1024
>> virtual memory  (kbytes, -v) unlimited
>> file locks  (-x) unlimited
>>
>
>
>
>
> On Tuesday, July 15, 2014 6:35:22 PM UTC-4, Mark Walkom wrote:
>>
>> It'd depend on your config I'd guess, in particular how many 
>> workers/threads you have and what ES output you are using in LS.
>>
>> Why are you cleaning an index like this anyway? It seems horribly 
>> inefficient.
>> Basically the error is "OutOfMemoryError", which means you've run out of 
>> heap for the operation to complete. What are the specs for your node, how 
>> much heap does ES have?
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 16 July 2014 00:43, Bastien Chong  wrote:
>>
>>> I have a basic setup with a logstash shipper, an indexer and an 
>>> elasticsearch cluster.
>>> Elasticsearch listen on the standart 9200/9300 and logstash indexer 
>>> 9301/9302.
>>>
>>> When I do a netstat | wc -l for the ES process: 184 found
>>> (sample)
>>>
 tcp0  0 :::172.17.7.87:9300 :::
 172.17.8.39:59573ESTABLISHED 23224/java
 tcp0  0 :::172.17.7.87:9300 :::
 172.17.7.87:47609ESTABLISHED 23224/java
 tcp0  0 :::172.17.7.87:53493:::172.17.7.87:9302
  
 ESTABLISHED 23224/java
 tcp0  0 :::172.17.7.87:9300 :::
 172.17.8.39:59564ESTABLISHED 23224/java
 tcp0  0 :::172.17.7.87:9300 :::
 172.17.7.87:47657ESTABLISHED 23224/java

>>>
>>> Same thing for the logstash indexer : 160 found
>>> (sample)
>>>
 tcp0  0 :::172.17.7.87:50132:::172.17.8.39:9300
  
 ESTABLISHED 1516/java
 tcp0  0 :::172.17.7.87:9301 :::
 172.17.7.87:60153ESTABLISHED 1516/java
 tcp0  0 :::172.17.7.87:9301 :::
 172.17.7.87:60145ESTABLISHED 1516/java
 tcp0  0 :::172.17.7.87:50129:::172.17.8.39:9300
  
 ESTABLISHED 1516/java
 tcp0  0 :::172.17.7.87:9302 :::
 172.17.7.87:53501ESTABLISHED 1516/java

>>>
>>> Also, not sure if related, when I try to delete some documents by query 
>>> ( curl -XDELETE 'http://localhost:9200/check/_query?pretty=1' -d 
>>> '{"query":{"range":{"@timestamp":{"from":"2014-07-10T00:00:00","to":"2014-07-14T05:00:00"'
>>>  
>>> )
>>>
>>> "RemoteTransportException[[Stonewall][inet[/172.17.8.39:9300]][deleteByQuery/shard]];
>>>  
>>> nested: OutOfMemoryError[unable to create new native thread]; "
>>>
>>>  I have a script that run this kind of query every 30 seconds to clean 
>>> up this particular index.
>>>
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/1167d4a6-6b87-45a9-835a-eba0ba696825%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.

Re: Cassandra + Elasticsearch or Just Elasticsearch for Primary data store.

2014-07-16 Thread pranav amin
Thanks. 

We have 300 TB of data, average size of document stored is 512KB. Just want 
to make sure that using ES as primary data store I'm not missing anything.
>From your response it looks to me like durability isn't a concern with ES.


Thanks
Pranav.

On Wednesday, July 16, 2014 6:53:52 AM UTC-4, mooky wrote:
>
> What do you mean by "durability"?
>
> Its highly likely that elastic has the same storage guarantees that 
> cassandra does.
> That said, some people like to have the flexibility of having the golden 
> source elsewhere and the ability to blow away the index & re-index at a 
> whim.
> There are a number of elastic users, however, where this is not viable - 
> where reindexing their volume of data would take a week or 2.
>
> How much data are you looking at storing/indexing? Mb? Gb? Tb? Pb?
>
> -M
>
>
>
> On Tuesday, 15 July 2014 15:27:15 UTC+1, pranav amin wrote:
>>
>> Thanks Tim.
>>
>> Does that mean i can't get durability if i store my data in ES as a 
>> primary data store? 
>>
>> Thanks
>> Pranav.
>>
>> On Monday, July 14, 2014 11:57:23 PM UTC-4, Tim Uckun wrote:
>>>
>>>
 I'm just confused if Cassandra can really make a difference here, since 
 looks to me ES can suffice here.



>>> If you are not going to be using Cassandra for indexing then there is no 
>>> reason to have it. If you want durability in case something goes wrong with 
>>> ES you can just store your data in a log file before pumping it into ES. 
>>>  If for whatever reason something happens to your ES cluster you can 
>>> reconstruct it using the log files.
>>>
>>>  
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fad94a5b-2378-4088-98f1-abb4366be230%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Getting Totals from a Terms Aggregation

2014-07-16 Thread Itamar Syn-Hershko
Aggregations operate on the results of a search query, so you can
definitely use that total also when you have sub-aggregations. As for
filter aggregations, you can have a subtract which acts as a sink for all
unused docs and subtract it's count from the total count

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Wed, Jul 16, 2014 at 5:07 PM, Michael Sander 
wrote:

> Hm, I don't think that works as a general solution. If you're using
> sub-aggregations or a filter aggregation, the total for the entire search
> will be too high.
>
>
> On Wed, Jul 16, 2014 at 10:01 AM, Itamar Syn-Hershko 
> wrote:
>
>> You get it from the search request that wraps the terms aggregation,
>> under total hits in the root of the response
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko 
>> Freelance Developer & Consultant
>> Author of RavenDB in Action 
>>
>>
>> On Wed, Jul 16, 2014 at 4:56 PM, Michael Sander > > wrote:
>>
>>> Hi,
>>>
>>> I've started using aggregations heavily to make nifty little pie charts
>>> like the one below. To generate the pie chart below, I ran a terms
>>> aggregation on a particular field which has entries such as Sterne, Fish,
>>> etc. However, the terms aggregation only returns a doc_count for each
>>> individual term, but I'd also like to know the total doc_count so that I
>>> can create another pie slice for "other." Is there any way to do this in a
>>> single aggregation call?
>>>
>>> Thanks,
>>>
>>>
>>> 
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>>
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/3f2f9311-52ff-4917-9d50-a48635f60957%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/OM6KSy88iEo/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuzRTe48eGgm%2BweE%2BnkzhZTpc8wp0bZoCYbGvLeEkZS5Q%40mail.gmail.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAFa8q3aSX%3D6QmTK5Qw9NoQoaHMChgbSkaYBwfqwPePVHLQqnKw%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZvnieNQrHGpHfC0nc8E1hUWKU81bx4hYJD0wzVREZ7QBg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Filtering different indexes with missing field

2014-07-16 Thread Tiago Lima
Hi,

I'm trying to filtering two different indexes with the same filter, but one 
of the indexes doesn't have the field that I'm trying to filter:

"mappings":{  
 "user":{  
  "properties":{  
   "inactive":{  
"type":"boolean",
"index":"not_analyzed"
}
   }
  }
 },
 "course":{
  "properties":{
  }
 }
}

Filtering:
"filter":{  
  "and":[  
 {  },
 {  
"term":{  
   "inactive":false
}
 }
  ]
   },

But only users is retrieved, how can I apply this filter just for user 
index.

I really appreciate any help.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9c5be39f-b1ab-4cf8-8b11-cca365cd228d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregations with object type properties

2014-07-16 Thread Amine Benhalloum
Hi Michael, and thank you for your answer.

I would rather not use nested objects because it will hurt the performance 
of my application.
Do you know of any another way ?


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3860607a-69fb-4eee-b673-7bfb1712dbd3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: short-circuit in bool query?

2014-07-16 Thread David Smith
I'm also interested in this. If there's a way to do this, it would be 
pretty cool.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0ebb9e2-7ae4-475c-ad9d-577ad22d9db3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Getting Totals from a Terms Aggregation

2014-07-16 Thread Michael Sander
Hm, I don't think that works as a general solution. If you're using
sub-aggregations or a filter aggregation, the total for the entire search
will be too high.


On Wed, Jul 16, 2014 at 10:01 AM, Itamar Syn-Hershko 
wrote:

> You get it from the search request that wraps the terms aggregation, under
> total hits in the root of the response
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko 
> Freelance Developer & Consultant
> Author of RavenDB in Action 
>
>
> On Wed, Jul 16, 2014 at 4:56 PM, Michael Sander 
> wrote:
>
>> Hi,
>>
>> I've started using aggregations heavily to make nifty little pie charts
>> like the one below. To generate the pie chart below, I ran a terms
>> aggregation on a particular field which has entries such as Sterne, Fish,
>> etc. However, the terms aggregation only returns a doc_count for each
>> individual term, but I'd also like to know the total doc_count so that I
>> can create another pie slice for "other." Is there any way to do this in a
>> single aggregation call?
>>
>> Thanks,
>>
>>
>> 
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/3f2f9311-52ff-4917-9d50-a48635f60957%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/OM6KSy88iEo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuzRTe48eGgm%2BweE%2BnkzhZTpc8wp0bZoCYbGvLeEkZS5Q%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFa8q3aSX%3D6QmTK5Qw9NoQoaHMChgbSkaYBwfqwPePVHLQqnKw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Doc values for field data

2014-07-16 Thread David Smith
Thank you, Adrien. That answers my questions.

On Wednesday, July 16, 2014 5:24:36 AM UTC-4, Adrien Grand wrote:
>
> On Tue, Jul 15, 2014 at 3:25 PM, David Smith  > wrote:
>
>> Thanks, Adrien. That brings me closer. 
>>
>> So when the documentations say doc values do not support filtering, it's 
>> talking about fielddata filtering for what's loaded into memory (anod not 
>> filtering as part of a query... say term filter).
>>
>
> Exactly.
>  
>
>> For further clarification - can a field that is not analyzed and only 
>> kept as doc values be used for querying/filtering (say a term filter on a 
>> numeric field or match query on a string field)? Or do all 
>> querying/filtering required the field to be in the uninverted index?
>>
>
> Doc values play no role when filtering (except for some filters that 
> support a `fielddata` mode, such as the range filter[1]). So if your field 
> has `index: no` you cannot use it in filters, and if it has `index: 
> not_analyzed` then you can, no matter whether doc values are enabled or not.
>
> [1] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-range-filter.html#_execution
>  
>
>> What I'm trying to understand how we can optimize querying/filtering in a 
>> large index (5 billion documents / 1 TB)? It's very hard to run a simple 
>> term filter because a bitset filter will need to be calculated that 
>> includes every single document. Wouldn't that utilize a lot of memory? Is 
>> there a way to speed that up?
>>
>
> If your filters are unlikely to be reused, then you should not cache them 
> by setting _cache to false. Caching filters only make filtering faster when 
> the likelyhood of reusing filters is high.
>
> -- 
> Adrien Grand
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c67ebf34-989b-4004-8b23-c9f7d00d9a13%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Nested Query & Filter Query

2014-07-16 Thread Dark Globe
Just found that replacing "match" for "term" in my second filter condition 
runs without error...but doesn't actually work so no results are returned 
when 1 should be :/

I'm sure I'm getting closer and I'm sure I shouldn't be finding this such a 
struggle!

Any pointers are welcome and appreciated.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f706abeb-461f-49e7-8f15-5cf6bf86cc0f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Getting Totals from a Terms Aggregation

2014-07-16 Thread Itamar Syn-Hershko
You get it from the search request that wraps the terms aggregation, under
total hits in the root of the response

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Wed, Jul 16, 2014 at 4:56 PM, Michael Sander 
wrote:

> Hi,
>
> I've started using aggregations heavily to make nifty little pie charts
> like the one below. To generate the pie chart below, I ran a terms
> aggregation on a particular field which has entries such as Sterne, Fish,
> etc. However, the terms aggregation only returns a doc_count for each
> individual term, but I'd also like to know the total doc_count so that I
> can create another pie slice for "other." Is there any way to do this in a
> single aggregation call?
>
> Thanks,
>
>
> 
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/3f2f9311-52ff-4917-9d50-a48635f60957%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuzRTe48eGgm%2BweE%2BnkzhZTpc8wp0bZoCYbGvLeEkZS5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregations with object type properties

2014-07-16 Thread Michael Sander
There are several ways of ding this. Probably the easiest is to add nested 
objects to your mapping. Read more here: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html

On Wednesday, July 16, 2014 9:50:54 AM UTC-4, Amine Benhalloum wrote:
>
> Hi everyone,
>
> I am trying to figure something out :
>
> Here's an example of a document that contains object properties, and then 
> trying to do simple terms aggregations.
> https://gist.github.com/BAmine/80e1be219d2ac272561a
>
> The response I get :
> {
>"took": 2,
>"timed_out": false,
>"_shards": {
>   "total": 5,
>   "successful": 5,
>   "failed": 0
>},
>"hits": {
>   "total": 1,
>   "max_score": 0,
>   "hits": []
>},
>"aggregations": {
>   "test": {
>  "buckets": [
> {
>"key": "canine",
>"doc_count": 1,
>"test2": {
>   "buckets": [
>  {
> "key": "cat",
> "doc_count": 1
>  },
>  {
> "key": "dog",
> "doc_count": 1
>  },
>  {
> "key": "tiger",
> "doc_count": 1
>  },
>  {
> "key": "wolf",
> "doc_count": 1
>  }
>   ]
>}
> },
> {
>"key": "feline",
>"doc_count": 1,
>"test2": {
>   "buckets": [
>  {
> "key": "cat",
> "doc_count": 1
>  },
>  {
> "key": "dog",
> "doc_count": 1
>  },
>  {
> "key": "tiger",
> "doc_count": 1
>  },
>  {
> "key": "wolf",
> "doc_count": 1
>  }
>   ]
>}
> }
>  ]
>   }
>}
> }
>
> The question is : How can I avoid getting, in my sub-aggregations, buckets 
> whose keys do not belong to the parent aggregation's keys ( example : cat 
> and tiger are not in the property whose label is feline ) ?
> Is there a way to do this without using nested properties ?
>
> Thank you !
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f0529c6e-81cf-4a96-a95b-8b7274ea1f06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Getting Totals from a Terms Aggregation

2014-07-16 Thread Michael Sander
Hi,

I've started using aggregations heavily to make nifty little pie charts 
like the one below. To generate the pie chart below, I ran a terms 
aggregation on a particular field which has entries such as Sterne, Fish, 
etc. However, the terms aggregation only returns a doc_count for each 
individual term, but I'd also like to know the total doc_count so that I 
can create another pie slice for "other." Is there any way to do this in a 
single aggregation call?

Thanks,



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3f2f9311-52ff-4917-9d50-a48635f60957%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Indexing and Searching Multimedia Files

2014-07-16 Thread Magnus Bäck
On Monday, July 14, 2014 at 12:30 CEST,
 George Viju  wrote:

> In elasticsearch the following datatypes of data can be indexed string,
> integer/long, float/double, boolean and null  which is referred in the
> following link
> [1]http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html.
> Is there any method/approach can be used to index or Search the
> Multimedia files like Audio, Video, Pdf , image etc. in elasticsearch?

What does audio file indexing mean? Is it the metadata you want to
index? And do you want to store the actual file data in Elasticsearch
or just the metadata?

-- 
Magnus Bäck| Software Engineer, Development Tools
magnus.b...@sonymobile.com | Sony Mobile Communications

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20140716135157.GA5415%40seldlx20533.corpusers.net.
For more options, visit https://groups.google.com/d/optout.


Aggregations with object type properties

2014-07-16 Thread Amine Benhalloum
Hi everyone,

I am trying to figure something out :

Here's an example of a document that contains object properties, and then 
trying to do simple terms aggregations.
https://gist.github.com/BAmine/80e1be219d2ac272561a

The response I get :
{
   "took": 2,
   "timed_out": false,
   "_shards": {
  "total": 5,
  "successful": 5,
  "failed": 0
   },
   "hits": {
  "total": 1,
  "max_score": 0,
  "hits": []
   },
   "aggregations": {
  "test": {
 "buckets": [
{
   "key": "canine",
   "doc_count": 1,
   "test2": {
  "buckets": [
 {
"key": "cat",
"doc_count": 1
 },
 {
"key": "dog",
"doc_count": 1
 },
 {
"key": "tiger",
"doc_count": 1
 },
 {
"key": "wolf",
"doc_count": 1
 }
  ]
   }
},
{
   "key": "feline",
   "doc_count": 1,
   "test2": {
  "buckets": [
 {
"key": "cat",
"doc_count": 1
 },
 {
"key": "dog",
"doc_count": 1
 },
 {
"key": "tiger",
"doc_count": 1
 },
 {
"key": "wolf",
"doc_count": 1
 }
  ]
   }
}
 ]
  }
   }
}

The question is : How can I avoid getting, in my sub-aggregations, buckets 
whose keys do not belong to the parent aggregation's keys ( example : cat 
and tiger are not in the property whose label is feline ) ?
Is there a way to do this without using nested properties ?

Thank you !

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fb7d54b-9a52-4583-ae62-f95cbf1c5e35%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How many tcp connections should ES/logstash generate ?

2014-07-16 Thread Bastien Chong
I'm not sure how to find answer that, I use the default settings in ES. The 
cluster is composed of 2 read/write node, and a read-only node.
There is 1 Logstash instance that simply output 2 type of data to ES. 
Nothing fancy.

I need to delete documents older than a day, for this particular thing, I 
can't create a daily index. Is there a better way ?

I'm using an EC2 m3.large instance, ES has 1.5GB of heap.

It seems like I'm hitting an OS limit, I can't "su - elasticsearch" : 

su: /bin/bash: Resource temporarily unavailable

Stopping elasticsearch fix this issue, so this is directly linked. 

> -bash-4.1$ ulimit -a
> core file size  (blocks, -c) 0
> data seg size   (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size   (blocks, -f) unlimited
> pending signals (-i) 29841
> max locked memory   (kbytes, -l) unlimited
> max memory size (kbytes, -m) unlimited
> open files  (-n) 65536
> pipe size(512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority  (-r) 0
> stack size  (kbytes, -s) 8192
> cpu time   (seconds, -t) unlimited
> max user processes  (-u) 1024
> virtual memory  (kbytes, -v) unlimited
> file locks  (-x) unlimited
>




On Tuesday, July 15, 2014 6:35:22 PM UTC-4, Mark Walkom wrote:
>
> It'd depend on your config I'd guess, in particular how many 
> workers/threads you have and what ES output you are using in LS.
>
> Why are you cleaning an index like this anyway? It seems horribly 
> inefficient.
> Basically the error is "OutOfMemoryError", which means you've run out of 
> heap for the operation to complete. What are the specs for your node, how 
> much heap does ES have?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 16 July 2014 00:43, Bastien Chong > 
> wrote:
>
>> I have a basic setup with a logstash shipper, an indexer and an 
>> elasticsearch cluster.
>> Elasticsearch listen on the standart 9200/9300 and logstash indexer 
>> 9301/9302.
>>
>> When I do a netstat | wc -l for the ES process: 184 found
>> (sample)
>>
>>> tcp0  0 :::172.17.7.87:9300 :::172.17.8.39:59573
>>> ESTABLISHED 23224/java
>>> tcp0  0 :::172.17.7.87:9300 :::172.17.7.87:47609
>>> ESTABLISHED 23224/java
>>> tcp0  0 :::172.17.7.87:53493:::172.17.7.87:9302 
>>> ESTABLISHED 23224/java
>>> tcp0  0 :::172.17.7.87:9300 :::172.17.8.39:59564
>>> ESTABLISHED 23224/java
>>> tcp0  0 :::172.17.7.87:9300 :::172.17.7.87:47657
>>> ESTABLISHED 23224/java
>>>
>>
>> Same thing for the logstash indexer : 160 found
>> (sample)
>>
>>> tcp0  0 :::172.17.7.87:50132:::172.17.8.39:9300 
>>> ESTABLISHED 1516/java
>>> tcp0  0 :::172.17.7.87:9301 :::172.17.7.87:60153
>>> ESTABLISHED 1516/java
>>> tcp0  0 :::172.17.7.87:9301 :::172.17.7.87:60145
>>> ESTABLISHED 1516/java
>>> tcp0  0 :::172.17.7.87:50129:::172.17.8.39:9300 
>>> ESTABLISHED 1516/java
>>> tcp0  0 :::172.17.7.87:9302 :::172.17.7.87:53501
>>> ESTABLISHED 1516/java
>>>
>>
>> Also, not sure if related, when I try to delete some documents by query ( 
>> curl -XDELETE 'http://localhost:9200/check/_query?pretty=1' -d 
>> '{"query":{"range":{"@timestamp":{"from":"2014-07-10T00:00:00","to":"2014-07-14T05:00:00"'
>>  
>> )
>>
>> "RemoteTransportException[[Stonewall][inet[/172.17.8.39:9300]][deleteByQuery/shard]];
>>  
>> nested: OutOfMemoryError[unable to create new native thread]; "
>>
>>  I have a script that run this kind of query every 30 seconds to clean up 
>> this particular index.
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/1167d4a6-6b87-45a9-835a-eba0ba696825%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8da97ed6-8ca5-42c0-863c-7c19ffca9afc%40googlegroups.com.
For more options, visit https://groups.g

Re: Nested Query & Filter Query

2014-07-16 Thread Dark Globe
Hi again,

Sorry, I've been looking through the 
documentation: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-filter.html
 
to try to work out how to supply multiple filters.

Once again though I am struggling to work out how to build the array as 
I've tried several different approaches but all result in an error, 
including my latest effort which you can see in my original gist which I 
have now updated to include your successful filter and my subsequent 
failure.

Clearly there is something fundamental about how these JSON objects are 
structured that I simply don't understand. What am I missing?

Thanks again for any help.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2dc68147-79a5-4281-987d-a81f3bda3839%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Nested Query & Filter Query

2014-07-16 Thread Dark Globe
Wow!, Thanks so much David.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b93f3370-3b52-456d-a828-1f612f1bb354%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Any experience with ES and Data Compressing Filesystems?

2014-07-16 Thread joergpra...@gmail.com
Ups, not true, Elasticsearch uses Lucene codec compression, and this is
also LZ4 (LZF only for backwards compatibility)

Here are some numbers:

http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene

Jörg


On Wed, Jul 16, 2014 at 2:28 PM, joergpra...@gmail.com <
joergpra...@gmail.com> wrote:

> You will not gain much advantage because ES already compresses data on
> disk with LZF, ZFS is using LZ4, which compression output is quite similar.
> In the file system statistics you will notice the compression ratio, and
> this will be no good value. So instead of having ZFS trying to compress
> where not much can be gained, you should switch it off.
>
> Jörg
>
>
> On Wed, Jul 16, 2014 at 12:56 PM, horst knete 
> wrote:
>
>> Hey Guys,
>>
>> to save a lot of hard disk space, we are going to use an compression file
>> system, which allows us transparent compression for the es-indices. (It
>> seems like es-indices are very good compressable, got up to 65%
>> compression-rate in some tests).
>>
>> Currently the indices are laying at a ext4-Linux Filesystem which
>> unfortunately dont have the transparent compression ability.
>>
>> Anyone of you got experience with compression file systems like BTRFS or
>> ZFS/OpenZFS and can tell us if this led to big performance losses?
>>
>> Thanks for responding
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/6c6c806a-f638-4139-a080-3da7670f0eca%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF%3D%3DEumbfUWcr0VyN4frFvAFqv8jHTmP%3DtBKB9jW%3D0oOQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Any experience with ES and Data Compressing Filesystems?

2014-07-16 Thread joergpra...@gmail.com
You will not gain much advantage because ES already compresses data on disk
with LZF, ZFS is using LZ4, which compression output is quite similar. In
the file system statistics you will notice the compression ratio, and this
will be no good value. So instead of having ZFS trying to compress where
not much can be gained, you should switch it off.

Jörg


On Wed, Jul 16, 2014 at 12:56 PM, horst knete  wrote:

> Hey Guys,
>
> to save a lot of hard disk space, we are going to use an compression file
> system, which allows us transparent compression for the es-indices. (It
> seems like es-indices are very good compressable, got up to 65%
> compression-rate in some tests).
>
> Currently the indices are laying at a ext4-Linux Filesystem which
> unfortunately dont have the transparent compression ability.
>
> Anyone of you got experience with compression file systems like BTRFS or
> ZFS/OpenZFS and can tell us if this led to big performance losses?
>
> Thanks for responding
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6c6c806a-f638-4139-a080-3da7670f0eca%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGtZicXu8vLe9oBG8bKS3rLp771_chUXjLg5E2m%2BHSCJA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


using only ES or combined combined with Mongodb or Cassandra

2014-07-16 Thread HansPeterSloot
Hi,

While reading and watching a lot of material about elasticsearch I 
frequently noticed that EE is often
 combined with other NoSql products like Mongodb or Cassandra.
That suprised me.

Can someone shed some light on the disadvantage of only using ES?
And what are the advantages of using a combination?

Regards Hans

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d6e873c0-6116-4ca4-90d5-423742c9b9d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How logs stored in Logstash/Elastisearch

2014-07-16 Thread Sandip Bankewar
Hello Mark,

Thanks for your response.

1. one log entry in the Logstash is a document what do you mean by that?

2. I mean if I have removed the raw data file as a backup purpose and then 
after few days I want to copy that again.

3. Data stored is in Fat file right???

4. I have the directory containing this format for data stored as 
*logstash-year-month-date 
->> 0 1 2 3 4 _state *

*I dont understand which file raw or fat file data stores???*

*Could you please help me on this?*

*Regards,*
*Sandip Bankewar*

On Wednesday, 16 July 2014 17:07:26 UTC+5:30, Mark Walkom wrote:
>
> 1. It's indexed within Elasticsearch as a json document, one log entry in 
> the Logstash is a document
> 2. The default is /var/lib/elasticsearch/data
> 3. No
> 4. You can backup using the snapshot API. What do you mean by remove and 
> replace though?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 16 July 2014 21:18, Sandip Bankewar > 
> wrote:
>
>> Hello All,
>>
>> Can anyone help me on this.
>>
>> 1. How data stored in logstash/elasticsearch?
>>
>> 2. Where is the raw data file(path)?
>>
>>  3. Is it encrypted?
>>
>> 4. Can we take backup of those data and can remove and replace easily?
>>
>> Regards,
>> Sandip Bankewar
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/27143765-c12c-4238-b34f-76d9c38eca83%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/837968b9-94ed-41e5-97a7-a8a945e89187%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Nested Query & Filter Query

2014-07-16 Thread David Pilato
Here is an example based on your data: 
https://gist.github.com/dadoonet/64458f7423863d93c49e

HTH

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 16 juillet 2014 à 12:58:40, Dark Globe (g...@henleydev.co.uk) a écrit:

Hi Elastic Searchers,

I am completely new to Elasticsearch but have been progressing well this week, 
I'm just hoping one of you may be able to help nudge me over the final hurdle.

I have a complete gist here that populates an index with 2 rock bands and their 
members and runs a successful nested search query to find bands containing 
members with the name Roger (2 results).

https://gist.github.com/saiglen/ad8b89836547c7e0ee60

However, what I also need is to be able to filter the results by other 
attributes, such as the year the band formed, or perhaps part of the band's 
name, 'Pink' for example. 1 or more additional filters.

I have been through the documentation and tried at least a dozen different JSON 
structures to try to get that final search to filter by year or band name in 
addition to the nested query, but without success :/

Everything I have tried so far has either returned no results at all or has 
just produced an error and failed entirely.

Can someone please help advise.

Cheers for any help.
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a61b1bed-d4cf-4ad2-8234-fc37d53190c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53c6648c.6b8b4567.86d7%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: How logs stored in Logstash/Elastisearch

2014-07-16 Thread Mark Walkom
1. It's indexed within Elasticsearch as a json document, one log entry in
the Logstash is a document
2. The default is /var/lib/elasticsearch/data
3. No
4. You can backup using the snapshot API. What do you mean by remove and
replace though?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 16 July 2014 21:18, Sandip Bankewar  wrote:

> Hello All,
>
> Can anyone help me on this.
>
> 1. How data stored in logstash/elasticsearch?
>
> 2. Where is the raw data file(path)?
>
> 3. Is it encrypted?
>
> 4. Can we take backup of those data and can remove and replace easily?
>
> Regards,
> Sandip Bankewar
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/27143765-c12c-4238-b34f-76d9c38eca83%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YQYC9C2ZUqLigmJC8148pj_XBep%2B3GwZDWZN7qLy27qg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


seeded random scoring differs between nodes

2014-07-16 Thread Dunaeth
Hi,

When I execute a seeded random scoring query multiple time on a two nodes 
cluster, I see differences in scoring whether the query is executed on a 
node or the other. Is this an expected behavior (I'd say it's a logical 
behavior) or should both nodes score documents the same way given the same 
seed ?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d64a8e8e-197e-4307-b3fc-5121eb489b0a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How logs stored in Logstash/Elastisearch

2014-07-16 Thread Sandip Bankewar
Hello All,

Can anyone help me on this.

1. How data stored in logstash/elasticsearch?

2. Where is the raw data file(path)?

3. Is it encrypted?

4. Can we take backup of those data and can remove and replace easily?

Regards,
Sandip Bankewar

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/27143765-c12c-4238-b34f-76d9c38eca83%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Nested Query & Filter Query

2014-07-16 Thread Dark Globe
Hi Elastic Searchers,

I am completely new to Elasticsearch but have been progressing well this 
week, I'm just hoping one of you may be able to help nudge me over the 
final hurdle.

I have a complete gist here that populates an index with 2 rock bands and 
their members and runs a successful nested search query to find bands 
containing members with the name Roger (2 results).

https://gist.github.com/saiglen/ad8b89836547c7e0ee60

However, what I also need is to be able to filter the results by other 
attributes, such as the year the band formed, or perhaps part of the band's 
name, 'Pink' for example. 1 or more additional filters.

I have been through the documentation and tried at least a dozen different 
JSON structures to try to get that final search to filter by year or band 
name in addition to the nested query, but without success :/

Everything I have tried so far has either returned no results at all or has 
just produced an error and failed entirely.

Can someone please help advise.

Cheers for any help.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a61b1bed-d4cf-4ad2-8234-fc37d53190c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Any experience with ES and Data Compressing Filesystems?

2014-07-16 Thread Mark Walkom
There's a few previous threads on this topic in the archives, though I
don't immediately recall seeing any performance metrics unfortunately.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 16 July 2014 20:56, horst knete  wrote:

> Hey Guys,
>
> to save a lot of hard disk space, we are going to use an compression file
> system, which allows us transparent compression for the es-indices. (It
> seems like es-indices are very good compressable, got up to 65%
> compression-rate in some tests).
>
> Currently the indices are laying at a ext4-Linux Filesystem which
> unfortunately dont have the transparent compression ability.
>
> Anyone of you got experience with compression file systems like BTRFS or
> ZFS/OpenZFS and can tell us if this led to big performance losses?
>
> Thanks for responding
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6c6c806a-f638-4139-a080-3da7670f0eca%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624anydS9-aNyDYUXz3RgtSCYJn1XUTEzKyFUiNUJr8hrbQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Any experience with ES and Data Compressing Filesystems?

2014-07-16 Thread horst knete
Hey Guys,

to save a lot of hard disk space, we are going to use an compression file 
system, which allows us transparent compression for the es-indices. (It 
seems like es-indices are very good compressable, got up to 65% 
compression-rate in some tests).

Currently the indices are laying at a ext4-Linux Filesystem which 
unfortunately dont have the transparent compression ability.

Anyone of you got experience with compression file systems like BTRFS or 
ZFS/OpenZFS and can tell us if this led to big performance losses?

Thanks for responding

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6c6c806a-f638-4139-a080-3da7670f0eca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cassandra + Elasticsearch or Just Elasticsearch for Primary data store.

2014-07-16 Thread mooky
What do you mean by "durability"?

Its highly likely that elastic has the same storage guarantees that 
cassandra does.
That said, some people like to have the flexibility of having the golden 
source elsewhere and the ability to blow away the index & re-index at a 
whim.
There are a number of elastic users, however, where this is not viable - 
where reindexing their volume of data would take a week or 2.

How much data are you looking at storing/indexing? Mb? Gb? Tb? Pb?

-M



On Tuesday, 15 July 2014 15:27:15 UTC+1, pranav amin wrote:
>
> Thanks Tim.
>
> Does that mean i can't get durability if i store my data in ES as a 
> primary data store? 
>
> Thanks
> Pranav.
>
> On Monday, July 14, 2014 11:57:23 PM UTC-4, Tim Uckun wrote:
>>
>>
>>> I'm just confused if Cassandra can really make a difference here, since 
>>> looks to me ES can suffice here.
>>>
>>>
>>>
>> If you are not going to be using Cassandra for indexing then there is no 
>> reason to have it. If you want durability in case something goes wrong with 
>> ES you can just store your data in a log file before pumping it into ES. 
>>  If for whatever reason something happens to your ES cluster you can 
>> reconstruct it using the log files.
>>
>>  
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6cf27509-09bf-4e08-834f-baf6665ad104%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


No efect refresh_interval

2014-07-16 Thread Marek Dabrowski
Hello

My configuration is:
6 nodes Elasticsearch cluster
OS: Centos 6.5
JVM: 1.7.0_25

Cluster is working fine. I can indexing data, query, etc. Now I'm doing 
test on package about ~50mln doc (~13GB). I would like take better 
performance during indexing data. To take this target I has been changed 
parameter refresh_interval. I did test for 1s, -1 and 600s. Time for 
indexing data is that same. I checked configuration (_settings) for index 
and value for refresh_interval is ok (has proper value), eg:

{
  "smt_20140501_10_20g_norefresh" : {
"settings" : {
  "index" : {
"uuid" : "q3imiZGQTDasQUuMWS8oiw",
"number_of_replicas" : "1",
"number_of_shards" : "6",
"refresh_interval" : "600s",
"version" : {
  "created" : "1020199"
}
  }
}
  }
}



Create index, setting refresh_interval and load is done on that same 
cluster node. Before test index is deleted and created again before start 
new test with new value of refresh_interval. All cluster nodes logs 
information that parameter has been changed, eg:
[2014-07-16 11:24:09,813][INFO ][index.shard.service  ] [h6] 
[smt_20140501_10_20g_norefresh][1] updating refresh_interval from [1s] 
to [-1]
or
[2014-07-16 11:32:32,928][INFO ][index.shard.service  ] [h6] 
[smt_20140501_10_20g_norefresh][1] updating refresh_interval from [1s] 
to [10m]

After start test new data are available immediately and indexing time that 
same in 3 cases. I don't know where is failure. Somebody know what is going 
on?

Regards
Marek

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f7565c36-98c7-4e3e-8132-796f9edfb3fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I make this search requirement work?

2014-07-16 Thread mooky
And it works a treat. Thanks.

It leads me to think that it would be very useful to use with a series of 
specialist (special-case) analyzers in conjunction with the standard 
analyzer.

Back to my original example - "0# (99.995%)" - what I really want is 
something that will extract "99.995%".
The standard analyzer will extract "99.995" (and the rest of the text), the 
whitespace analyzer will extract "(99.995%)".

Does a financial/numeric/accounting analyzer already exist? ie Something 
that extracts "99.995%" or "$44.5665" or "-45bps" ?

-M






On Tuesday, 15 July 2014 18:58:46 UTC+1, mooky wrote:
>
> Thanks. That looks interesting!
>
>
> On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:
>>
>> Hello Mooky , 
>>
>> You can apply multiple analyzers to a field -
>> https://github.com/yakaz/elasticsearch-analysis-combo/
>>
>> So you can add all your analyzer here and apply it.
>>
>> Thanks
>>   Vineeth
>>
>>
>> On Tue, Jul 15, 2014 at 8:10 PM, mooky  wrote:
>>
>>> I have a bit of an odd requirement in so far as analyzer is concerned. 
>>> Wondering if anyone has any tips/suggestions. 
>>> I have an item I am indexing (grade) that has a property (name) whose 
>>> value can be "0# (99.995%)". 
>>> I am doing a prefix search on _all.
>>> I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%. 
>>> I also want the user to be able to copy-paste "0# (99.995%)" and it 
>>> should work.
>>>
>>> I am currently using the whitespace analyzer - which works for many of 
>>> my cases except the tricky one above.
>>> 99.995 doesnt work.
>>> But "(99.995" does. Because obviously after whitespace tokenization, the 
>>> token begins with (.
>>> I could filter out the "(" and ")" characters. But then "0# (99.995%)" 
>>> wont work.
>>> Does anyone have some different suggestions?
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/78a267ff-869e-462d-80c4-057c907e0324%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: puppet-elasticsearch options

2014-07-16 Thread Andrej Rosenheinrich
Hi Richard,

getting back to this after a while. Thanks for pointing me to the fact, 
that the class itself actually does nothing than install and only instances 
merge configs. This was something completely not aware to me (may be you 
could add a line in the documentation?) but looking through the module I 
could figure it out.
What I am wondering, what was the reason for this design decision? When I 
want to install elasticsarch once on a machine I have to configure the 
class (things like manage_repo can not be configured in the instance, 
right?) and an instance. This will install two services, one for the class, 
not caring about config, and one for the instance, what might be confusing 
for some users. 

Again, thanks for your help that finally got me on the right track!
Andrej

Am Dienstag, 1. Juli 2014 14:37:55 UTC+2 schrieb Richard Pijnenburg:
>
> Hi Andrej,
>
> Sorry for the late response. Didn't get an update email about it.
>
> As long as you don't setup an instance with the 'elasticsearch::instance' 
> define it will only install the package but do nothing afterwards.
> I recently fixed that the default files from the packages are being 
> removed now.
> The memory can be set via the init_defaults hash by setting the ES_HEAP 
> option.
>
> The issue with 0.90.x versions is that it automatically starts up after 
> package installation.
> Since i don't stop it, it keeps running. Its advised to run a newer 
> version of ES since 0.90.x will be EOL'd at some point.
>
>
> On Thursday, June 26, 2014 2:24:47 PM UTC+1, Andrej Rosenheinrich wrote:
>>
>> Hi Richard,
>>
>> thanks for your answer, it for sure helped! Still, I am puzzling with a 
>> few effects and questions:
>>
>> 1.) I am a bit confused by your class/instance idea. I can do something 
>> pretty simple like class { 'elasticsearch' :  version => '0.90.7' } and it 
>> will install elasticsearch in the correct version using the default 
>> settings you defined. Repeating this (I tested every step on a fresh debian 
>> instance in a VM, no different puppet installation steps in between) with a 
>> config added in class like 
>>
>> class { 'elasticsearch' :
>> version => '0.90.7',
>> config => {
>>   'cluster'=> {
>> 'name' => 'andrejtest'
>>   },
>>   'http.port' => '9210'
>> }
>> }
>>   
>> I still get elasticsearch installed, but it completely ignores everything 
>> in the config. (I should be able to curl localhost:9210, but its up and 
>> running on the old default port, using the old cluster name). You explained 
>> overwriting for instances and classes a bit, so I tried the following thing 
>> (again, blank image, no previous installation) :
>>
>>   class { 'elasticsearch' :
>> version => '0.90.7',
>> config => {
>>   'cluster'=> {
>> 'name' => 'andrejtest'
>>   },
>>   'http.port' => '9210'
>> }
>>   }
>>
>>   elasticsearch::instance { 'es-01':
>>   }
>>
>> What happened is that I have two elasticsearch instances running, one 
>> with the default value and another one (es-01) that uses the provided 
>> configuration. Even freakier, I install java7 in my script before the 
>> snippet posted , the first (default based) elasticsearch version uses the 
>> standard openjdk-6 java, the second instance (es-01) uses java7. 
>> So, where is my mistake or what am I doing wrong? What would be the way 
>> to install and start only one service using provided configuration? And 
>> does elasticsearch::instance require an instance name? I would really miss 
>> the funny comic node names ;)
>>
>> 2. As you pointed out I can define all values from elasticsearch.yml in 
>> the config hash. But what about memory settings (I usually modify the 
>> init.d script for that), can I configure Xms and Xmx settings in the puppet 
>> module somehow?
>>
>> Logging configuration would be a nice-to-have (no must-have), just in 
>> case you were wondering ;)
>>
>> I hope my questions don't sound too confusing, if you could give me a 
>> hint on what I am doing wrong I would really appreciate it.
>>
>> Thanks in advance!
>> Andrej
>>
>>
>> Am Freitag, 20. Juni 2014 09:44:49 UTC+2 schrieb Richard Pijnenburg:
>>>
>>> Hi Andrej,
>>>
>>> Thank you for using the puppet module :-)
>>>
>>> The 'port' and 'discovery minimum' settings are both configuration 
>>> settings for the elasticsearch.yml file.
>>> You can set those in the 'config' option variable, for example:
>>>
>>> elasticsearch::instance { 'instancename':
>>>   config => { 'http.port' => '9210', 
>>> 'discovery.zen.minimum_master_nodes' => 3 }
>>> }
>>>
>>>
>>> For the logging part, management of the logging.yml file is very limited 
>>> at the moment but i hope to get some feedback on extending that.
>>> The thresholds for the slowlogs can be set in the same config option 
>>> variable.
>>> See 
>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-slowlog.html#index-s

Re: Garbage collection pauses causing cluster to get unresponsive

2014-07-16 Thread joergpra...@gmail.com
Adding to this recommendations, I would suggest running iostat tool to
monitor for any suspicious "%iowait" states while
ESRejectedExecutionExceptions do arise.

Jörg


On Wed, Jul 16, 2014 at 11:53 AM, Michael McCandless  wrote:

> Where is the index stored in your EC2 instances?  It's it just an EBS
> attached storage (magnetic or SSDs?  provisioned IOPs or the default).
>
> Maybe try putting the index on the SSD instance storage instead?  I
> realize this is not a long term solution (limited storage, and it's cleared
> on reboot), but it would be a simple test to see if the IO limitations of
> EBS is the bottleneck here.
>
> Can you capture the hot threads output when you're at 200% CPU after
> indexing for a while?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Jul 16, 2014 at 3:03 AM, Srinath C  wrote:
>
>> Hi Joe/Michael,
>>I tried all your suggestions and found a remarkable difference in the
>> way elasticsearch is able to handle the bulk indexing.
>>Right now, I'm able to ingest at the rate of 25K per second with the
>> same setup. But occasionally there are still some
>> EsRejectedExecutionException being raised. The CPUUtilization on the
>> elasticsearch nodes is so low (around 200% on an 8 core system) that it
>> seems that something else is wrong. I have also tried to increase
>> queue_size but it just delays the EsRejectedExecutionException.
>>
>> Any more suggestions on how to handle this?
>>
>> *Current setup*: 4 c3.2xlarge instances of ES 1.2.2.
>> *Current Configurations*:
>> index.codec.bloom.load: false
>> index.compound_format: false
>> index.compound_on_flush: false
>> index.merge.policy.max_merge_at_once: 4
>> index.merge.policy.max_merge_at_once_explicit: 4
>> index.merge.policy.max_merged_segment: 1gb
>> index.merge.policy.segments_per_tier: 4
>> index.merge.policy.type: tiered
>> index.merge.scheduler.max_thread_count: 4
>> index.merge.scheduler.type: concurrent
>> index.refresh_interval: 10s
>> index.translog.flush_threshold_ops: 5
>> index.translog.interval: 10s
>> index.warmer.enabled: false
>> indices.memory.index_buffer_size: 50%
>> indices.store.throttle.type: none
>>
>>
>>
>>
>>
>> On Tue, Jul 15, 2014 at 6:24 PM, Srinath C  wrote:
>>
>>> Thanks Joe, Michael and all. Really appreciate you help.
>>> I'll try out as per your suggestions and run the tests. Will post back
>>> on my progress.
>>>
>>>
>>>
>>> On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless <
>>> m...@elasticsearch.com> wrote:
>>>
 First off, upgrade ES to the latest (1.2.2) release; there have been a
 number of bulk indexing improvements since 1.1.

 Second, disable merge IO throttling.

 Third, use the default settings, but increase index.refresh_interval to
 perhaps 5s, and set index.translog.flush_threshold_ops to maybe 5: this
 decreases the frequency of Lucene level commits (= filesystem fsyncs).

 If possible, use SSDs: they are much faster for merging.

 Mike McCandless

 http://blog.mikemccandless.com


 On Mon, Jul 14, 2014 at 11:03 PM, Srinath C 
 wrote:

>
> Each document is around 300 bytes on average so that bring up the data
> rate to around 17Mb per sec.
> This is running on ES version 1.1.1. I have been trying out different
> values for these configurations. queue_size was increased when I got
> EsRejectedException due to queue going full (default size of 50).
> segments_per_tier was picked up from some articles on scaling. What would
> be a reasonable value based on my data rate?
>
> If 60K seems to be too high are there any benchmarks available for
> ElasticSearch?
>
> Thanks all for your replies.
>
>
>
> On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:
>
>> index.merge.policy.segments_per_tier: 100 and threadpool.bulk.queue_
>> size: 500 are extreme settings that should be avoided as they
>> allocate much resources. What you see by UnavailbleShardException /
>> NoNodes is congestion because of such extreme values.
>>
>> What ES version is this? Why don't you use the default settings?
>>
>> Jörg
>>
>>
>> On Mon, Jul 14, 2014 at 4:46 AM, Srinath C  wrote:
>>
>>> Hi,
>>>I'm having a tough time to keep ElasticSearch running healthily
>>> for even 20-30 mins in my setup. At an indexing rate of 28-36K per 
>>> second,
>>> the CPU utilization soon drops to 100% and never recovers. All client
>>> requests fail with UnavailbleShardException or "No Nodes" exception. The
>>> logs show warnings from "monitor.jvm" saying that GC did not free up 
>>> much
>>> of memory.
>>>
>>>  The ultimate requirement is to import data into the ES cluster at
>>> around 60K per second on a setup explained below. The only operation 
>>> being
>>> performed is bulk import of documents. Soon the ES nodes become
>>> unr

Re: percolator throughput decreases as time passes

2014-07-16 Thread Martijn v Groningen
Do the amount of registered percolate queries also increase?


On 15 July 2014 12:02, Seungjin Lee  wrote:

>
> ​
> hi all,
>
> we use elasticsearch with storm, continuously making percolation request.
>
> as you see above, percolator throughput decreases as time passes.
>
> but we are not seeing any other problematic statistics, except that CPU
> usage also decreases as throughput decreases.
>
> can you guess any reason for this? we are using es v1.1.0
>
> sincerely,
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAL3_U41%3DMhee0xNDvU5s8NyKyOZEW_gXoQ2FOB39pr5s59ocwg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Met vriendelijke groet,

Martijn van Groningen

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BA76Tx1wQhd7w0Kg-8U_R3RXOaF5AQkAO8fm6jm%2BtQJ%3DArpsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: short-circuit in bool query?

2014-07-16 Thread 陳智清
+more description

What we concerned is the search performance. In our case, user may send a 
bool query containing 5 or more match_phrase queries. If each match_phrase 
executes independently, the search performance will be extremely slow. We 
would like to have a way to short-circuit this. Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1b52723e-668a-420c-95eb-c27223665e60%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Garbage collection pauses causing cluster to get unresponsive

2014-07-16 Thread Michael McCandless
Where is the index stored in your EC2 instances?  It's it just an EBS
attached storage (magnetic or SSDs?  provisioned IOPs or the default).

Maybe try putting the index on the SSD instance storage instead?  I realize
this is not a long term solution (limited storage, and it's cleared on
reboot), but it would be a simple test to see if the IO limitations of EBS
is the bottleneck here.

Can you capture the hot threads output when you're at 200% CPU after
indexing for a while?

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jul 16, 2014 at 3:03 AM, Srinath C  wrote:

> Hi Joe/Michael,
>I tried all your suggestions and found a remarkable difference in the
> way elasticsearch is able to handle the bulk indexing.
>Right now, I'm able to ingest at the rate of 25K per second with the
> same setup. But occasionally there are still some
> EsRejectedExecutionException being raised. The CPUUtilization on the
> elasticsearch nodes is so low (around 200% on an 8 core system) that it
> seems that something else is wrong. I have also tried to increase
> queue_size but it just delays the EsRejectedExecutionException.
>
> Any more suggestions on how to handle this?
>
> *Current setup*: 4 c3.2xlarge instances of ES 1.2.2.
> *Current Configurations*:
> index.codec.bloom.load: false
> index.compound_format: false
> index.compound_on_flush: false
> index.merge.policy.max_merge_at_once: 4
> index.merge.policy.max_merge_at_once_explicit: 4
> index.merge.policy.max_merged_segment: 1gb
> index.merge.policy.segments_per_tier: 4
> index.merge.policy.type: tiered
> index.merge.scheduler.max_thread_count: 4
> index.merge.scheduler.type: concurrent
> index.refresh_interval: 10s
> index.translog.flush_threshold_ops: 5
> index.translog.interval: 10s
> index.warmer.enabled: false
> indices.memory.index_buffer_size: 50%
> indices.store.throttle.type: none
>
>
>
>
>
> On Tue, Jul 15, 2014 at 6:24 PM, Srinath C  wrote:
>
>> Thanks Joe, Michael and all. Really appreciate you help.
>> I'll try out as per your suggestions and run the tests. Will post back on
>> my progress.
>>
>>
>>
>> On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless <
>> m...@elasticsearch.com> wrote:
>>
>>> First off, upgrade ES to the latest (1.2.2) release; there have been a
>>> number of bulk indexing improvements since 1.1.
>>>
>>> Second, disable merge IO throttling.
>>>
>>> Third, use the default settings, but increase index.refresh_interval to
>>> perhaps 5s, and set index.translog.flush_threshold_ops to maybe 5: this
>>> decreases the frequency of Lucene level commits (= filesystem fsyncs).
>>>
>>> If possible, use SSDs: they are much faster for merging.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Mon, Jul 14, 2014 at 11:03 PM, Srinath C  wrote:
>>>

 Each document is around 300 bytes on average so that bring up the data
 rate to around 17Mb per sec.
 This is running on ES version 1.1.1. I have been trying out different
 values for these configurations. queue_size was increased when I got
 EsRejectedException due to queue going full (default size of 50).
 segments_per_tier was picked up from some articles on scaling. What would
 be a reasonable value based on my data rate?

 If 60K seems to be too high are there any benchmarks available for
 ElasticSearch?

 Thanks all for your replies.



 On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

> index.merge.policy.segments_per_tier: 100 and threadpool.bulk.queue_
> size: 500 are extreme settings that should be avoided as they
> allocate much resources. What you see by UnavailbleShardException /
> NoNodes is congestion because of such extreme values.
>
> What ES version is this? Why don't you use the default settings?
>
> Jörg
>
>
> On Mon, Jul 14, 2014 at 4:46 AM, Srinath C  wrote:
>
>> Hi,
>>I'm having a tough time to keep ElasticSearch running healthily
>> for even 20-30 mins in my setup. At an indexing rate of 28-36K per 
>> second,
>> the CPU utilization soon drops to 100% and never recovers. All client
>> requests fail with UnavailbleShardException or "No Nodes" exception. The
>> logs show warnings from "monitor.jvm" saying that GC did not free up much
>> of memory.
>>
>>  The ultimate requirement is to import data into the ES cluster at
>> around 60K per second on a setup explained below. The only operation 
>> being
>> performed is bulk import of documents. Soon the ES nodes become
>> unresponsive and the CPU utilization drops to 100% (from 400-500%). They
>> don't seem to recover even after the bulk import operations are ceased.
>>
>>   Any suggestions on how to tune the GC based on my requirements?
>> What other information would be needed to look into this?
>>
>> Thanks,
>> Srinath.
>>
>>
>> The setup:
>>   - Cluster

short-circuit in bool query?

2014-07-16 Thread 陳智清
Hello,

I would like to know if I send a bool query like

{

"bool": {

"must": [

{

"term": { "user" : "john" }

},

{ 

"term": { "age" : 19 }

}

]

} 

}

is the 2nd query (i.e. "age":19 ) only executed on documents that matches 
the 1st query? If not, can I have the same effect using other query types? 
Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e48e0d77-82f2-4660-8dd9-d3a5dacd1868%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Direct buffer memory problem on master Discovery

2014-07-16 Thread Pedro Jerónimo

*Java: *java version "1.7.0_55" 
*ElasticSearch: *1.2.1


On Wednesday, July 16, 2014 10:55:10 AM UTC+2, Jörg Prante wrote:
>
> What ES, and what Java version is this?
>
> Jörg
>
>
> On Tue, Jul 15, 2014 at 2:33 PM, Pedro Jerónimo  > wrote:
>
>> I have a cluster of 2 ES machines with a lot of indexing, not so much 
>> searching. I'm using 2 EC2 machines with 30gb of RAM and I'm running ES on 
>> each with 12gb heap (ES_HEAP_SIZE) and one of them (let's call it logs1) 
>> is running logstash as well, with 2gb heap. The master node is logs1 and 
>> the other instance is logs2. I start the cluster and every things looks 
>> fine, but after a while (1-3 days) I get the following error on logs1:
>>
>> [2014-07-15 12:26:39,867][WARN ][transport.netty ] [Keen Marlow] 
>> exception caught on transport layer [[id: 0x16801a48, /XX.XX.XXX.XX:36314 
>> => /XX.XXX.XX.XX:9300]], closing connection 
>> java.lang.OutOfMemoryError: Direct buffer memory
>> ...Stack Trace...
>>
>> And then the cluster is no longer connected and if I try to restart 
>> logs2, I get the same error above for logs1 and this one for logs2:
>>
>> [2014-07-15 12:27:39,282][INFO ][discovery.ec2 ] [Betty Ross Banner] 
>> failed to send join request to master [[Keen 
>> Marlow][9a7FIRpBSrKQcdcV_sjSTw][ip-XX-XX-XXX-XX][inet[/XX.XX.XXX.XX:9300]]{aws_availability_zone=us-west-2a,
>>  
>> master=true}], reason 
>> [org.elasticsearch.transport.RemoteTransportException: [Keen 
>> Marlow][inet[/XX.XX.XXX.XX:9300]][discovery/zen/join]; 
>> org.elasticsearch.transport.NodeDisconnectedException: [Betty Ross 
>> Banner][inet[/XX.XXX.XX.XX:9300]][discovery/zen/join/validate] disconnected]
>>
>> Is there any memory configuration I should tune up a bit? I'm kind of new 
>> to ElasticSearch so I'd love some help! :).
>>
>> Thanks!
>>
>> Pedro
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/7d155e16-c8ff-40b0-ac0a-e3719e8296c3%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3dc6803d-618c-4c2d-84cc-1fa9e3a7d6a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: architecture and performance question on searching small subsets of documents

2014-07-16 Thread Michael McCandless
Try the filter approach first and only if performance isn't good enough,
look into other approaches.  Lucene is quite fast at intersecting filters
with large postings lists these days...

Separate index per user is not only wasteful, because of the duplicated
content, but will consume substantially more RAM/disk/file descriptors just
because of the overhead required for an index.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Jul 15, 2014 at 9:26 AM, Mike Topper  wrote:

> Hello,
>
> I'm new elasticsearch, so this might be a stupid question but i'd love
> some input before I get started creating my elasticsearch cluster.
>
> Basically I will be indexing documents with a few fields (documents are
> pretty small in size).  there are ~90million documents total.
>
> On the search side of things, each search will be limited by the small
> subset of documents that the user doing the search owns.
>
> my initial thought was to just have one large index for all documents and
> have a multi-value field that held the user ids of each user that owned
> that document.  then when searching across the index i would do a filter
> query to limit by that user id.  My only concern here is that this might be
> slow query times because you are always having to filter down by user id
> from a large data set to a very small subset (on average a user probably
> owns less than 1k documents).
>
> The other option I had is that i could create an index for each user and
> just index their documents into their index, but this would duplicate a
> massive amount of data and just seems hacky.
>
> Any suggestions?
>
> Thanks,
> Mike
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALdNedL-sLM%3DyMWsHHzriBmMwfe08mxVG%3D%3D9tSwxLwiWzfAcyw%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRcj-ug82icWsPntUaXsYLJNwTcYgcHzD7tLk1SMi0PVQg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Doc values for field data

2014-07-16 Thread Adrien Grand
On Tue, Jul 15, 2014 at 3:25 PM, David Smith 
wrote:

> Thanks, Adrien. That brings me closer.
>
> So when the documentations say doc values do not support filtering, it's
> talking about fielddata filtering for what's loaded into memory (anod not
> filtering as part of a query... say term filter).
>

Exactly.


> For further clarification - can a field that is not analyzed and only kept
> as doc values be used for querying/filtering (say a term filter on a
> numeric field or match query on a string field)? Or do all
> querying/filtering required the field to be in the uninverted index?
>

Doc values play no role when filtering (except for some filters that
support a `fielddata` mode, such as the range filter[1]). So if your field
has `index: no` you cannot use it in filters, and if it has `index:
not_analyzed` then you can, no matter whether doc values are enabled or not.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-range-filter.html#_execution


> What I'm trying to understand how we can optimize querying/filtering in a
> large index (5 billion documents / 1 TB)? It's very hard to run a simple
> term filter because a bitset filter will need to be calculated that
> includes every single document. Wouldn't that utilize a lot of memory? Is
> there a way to speed that up?
>

If your filters are unlikely to be reused, then you should not cache them
by setting _cache to false. Caching filters only make filtering faster when
the likelyhood of reusing filters is high.

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6hE8CenTe9QfwWA5Rx45-mM%2BoOCSwPELOpsP_tKTGthA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: architecture and performance question on searching small subsets of documents

2014-07-16 Thread Adrien Grand
Having a single large index is probably the best option, and will scale
better when growing your base of users.

I would recommend to watch the following video:
http://www.elasticsearch.org/videos/big-data-search-and-analytics/. The
part you are interested in starts at 13:20.


On Tue, Jul 15, 2014 at 3:26 PM, Mike Topper  wrote:

> Hello,
>
> I'm new elasticsearch, so this might be a stupid question but i'd love
> some input before I get started creating my elasticsearch cluster.
>
> Basically I will be indexing documents with a few fields (documents are
> pretty small in size).  there are ~90million documents total.
>
> On the search side of things, each search will be limited by the small
> subset of documents that the user doing the search owns.
>
> my initial thought was to just have one large index for all documents and
> have a multi-value field that held the user ids of each user that owned
> that document.  then when searching across the index i would do a filter
> query to limit by that user id.  My only concern here is that this might be
> slow query times because you are always having to filter down by user id
> from a large data set to a very small subset (on average a user probably
> owns less than 1k documents).
>
> The other option I had is that i could create an index for each user and
> just index their documents into their index, but this would duplicate a
> massive amount of data and just seems hacky.
>
> Any suggestions?
>
> Thanks,
> Mike
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALdNedL-sLM%3DyMWsHHzriBmMwfe08mxVG%3D%3D9tSwxLwiWzfAcyw%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6cE4Nc66cP1TAyTjdWUqg0GiGj34ZpFcsjAZ8GR8cUjA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Direct buffer memory problem on master Discovery

2014-07-16 Thread joergpra...@gmail.com
What ES, and what Java version is this?

Jörg


On Tue, Jul 15, 2014 at 2:33 PM, Pedro Jerónimo 
wrote:

> I have a cluster of 2 ES machines with a lot of indexing, not so much
> searching. I'm using 2 EC2 machines with 30gb of RAM and I'm running ES on
> each with 12gb heap (ES_HEAP_SIZE) and one of them (let's call it logs1)
> is running logstash as well, with 2gb heap. The master node is logs1 and
> the other instance is logs2. I start the cluster and every things looks
> fine, but after a while (1-3 days) I get the following error on logs1:
>
> [2014-07-15 12:26:39,867][WARN ][transport.netty ] [Keen Marlow] exception
> caught on transport layer [[id: 0x16801a48, /XX.XX.XXX.XX:36314 =>
> /XX.XXX.XX.XX:9300]], closing connection
> java.lang.OutOfMemoryError: Direct buffer memory
> ...Stack Trace...
>
> And then the cluster is no longer connected and if I try to restart logs2,
> I get the same error above for logs1 and this one for logs2:
>
> [2014-07-15 12:27:39,282][INFO ][discovery.ec2 ] [Betty Ross Banner]
> failed to send join request to master [[Keen
> Marlow][9a7FIRpBSrKQcdcV_sjSTw][ip-XX-XX-XXX-XX][inet[/XX.XX.XXX.XX:9300]]{aws_availability_zone=us-west-2a,
> master=true}], reason
> [org.elasticsearch.transport.RemoteTransportException: [Keen
> Marlow][inet[/XX.XX.XXX.XX:9300]][discovery/zen/join];
> org.elasticsearch.transport.NodeDisconnectedException: [Betty Ross
> Banner][inet[/XX.XXX.XX.XX:9300]][discovery/zen/join/validate] disconnected]
>
> Is there any memory configuration I should tune up a bit? I'm kind of new
> to ElasticSearch so I'd love some help! :).
>
> Thanks!
>
> Pedro
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7d155e16-c8ff-40b0-ac0a-e3719e8296c3%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGZauOEQXa6ybBTqgSPMuVR4AS-Tdio7X61Wfkobc7sJQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Something I am finding difficult, using Aggregations

2014-07-16 Thread Adrien Grand
On Thu, Jul 3, 2014 at 6:24 PM, mooky  wrote:

> By the way, it appears that doing a Terms sub-aggregation (as I suggested
> in (b)) can be a bit of a performance murderer...
> In my case I am already doing a Terms aggregation (on the id) - and the
> Terms sub-aggregation is turning a ~10ms response into a ~1ms response
> :-o
>
> Sure, obviously there exists an id-reference data mapping in the system.
> But it doesn't really scale having to dereference ids on read operations.
> Either :
> a) its a remote call - and making 10's or 100's of remote calls to serve a
> single user request isnt going to perform or scale well.
> b) the reference data has to be all held in RAM - which doesn't scale well.
>
> The thing is that we have the data in the index - we already de-referenced
> it when we built the document to index it.
>
> I can try make a token - but as you can imagine, trying to encode/decode
> all the location details into 1 token will make a big token
>

There are no remote calls, but indeed aggregations are stored in RAM. So if
the field that you are using for the first-level terms aggregation has a
high cardinality, adding a sub-aggregation certainly adds memory pressure
(CPU overhead as well, but not enough to justify this slow down).

Deferred aggregations might help for that issue:
https://github.com/elasticsearch/elasticsearch/pull/6128. It would allow
elasticsearch to compute the top ownerIds first, take the top N and only
then to resolve their ownerName using a top_hits or a terms aggregation.
They will be available in Elasticsearch 1.3 that we expect to release soon.

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5B88O%2BGS%2BjvX6wr44h3N91xSQtF8TT3vS66AzbECmPqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Need mapping (or query) tips

2014-07-16 Thread Loïc Wenkin
Hello everybody,

I am working on a multilingual project and I need some tips about mapping 
(or queries). I have two needs :
- I would like to benefit of the full text power and specific language 
analyzers power on a field of my documents. I need that for some of my 
searches. Users can do some search like (Hello toto) that will return 
documents containing (Hello, my name is toto) (1).
- On the other hand, users have also the ability to do some search like 
("Hello toto") that won't return documents containing ("Hello, my name is 
toto"), but they will only return documents containing ("Hello toto") (a 
little bit as Google do when we use quotes) (2).

Currently, in my mapping, I have something like that:

{
 "myfieldinfrench": {
   ...
   "analyzer": "french"
  }
}

This allow me to easily meet my first need, but not my second one.

I was thinking to index twice the field (using multi field types) but is it 
a good idea? Won't it increase my index size?

Is there another way than using multi fields to meet my needs?

Thanks a lot for your replies.

Regards,
Loïc Wenkin

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/70b46c78-5e03-4ff9-8cd6-ba2217642fa2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregations with filter

2014-07-16 Thread Adrien Grand
Hi,

A very likely cause for this issue is that by deleting and recreating your
index, you lost your mappings. Quite likely, you want to set your COLA,
COLB and COLC fields index=not_analyzed:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#string


On Tue, Jul 15, 2014 at 8:57 PM, esuser  wrote:

> Hi All,
>
>
> I am new to elastic search . I am trying below query in order to get
> results based on some where condition with group by and avg.
>
> Query always returns zero hits, docs. If i remove terms , it is giving
> results.
>
> This was working fine sometime back.After i deleted the index and created
> a new one, it is not working. Any pointers  would help.
>
> If I use fields which are not string as filter terms, it gives me proper
> results.
>
> {
>   "aggs": {
> "filtered": {
>   "filter": {
> "bool": {
>   "must": [
> {
>   "term": {
> "COLA": "NA"
>   }
> },
> {
>   "term": {
> "COLB": "CUSTOMER"
>   }
> }  ,
> {
>   "range": {
> "SCORE": {
>   "from": 0,
>   "to": 90
> }
>   }
> }
>   ]
> }
>  } ,
> "aggs" : {
> "cust" : {
> "terms" : {
> "field" : "COLC"
> },
> "aggs" : {
> "avg_score" : { "avg" : { "field" : "SCORE" } }
> }
> }
> }
> }
>  }
> }
>
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7b1cb3b5-7276-4ed8-a668-3cc7c285fbf0%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7%2BvsQR%3D--C5w1b1_qE9vKTESR_LxpHy4ZdsmUOyRJ-mQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Template mapping success, filtering with it failed.

2014-07-16 Thread t goto
Hi, 
I don't understand why, but when I changed "time_raw"'s index as 
"not_analyzed", it worked.

{
  "tempalte_blahblah" : {
"template" : "logstash*",
"mappings" : {
  "blah" : {
"properties" : {
"hostname": { "type": "string", "index":"not_analyzed" },
"time_raw": { "type": "date", "index": "not_analyzed", "format": 
"-MM-dd 
HH:mm:ss.SSS" }
  }
}
  }

}

I really need to study hard behavior of elasticsearch, thanks.  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1e05e3aa-0756-44f6-bb92-f874ef8fe3b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Integration of latest Kibana 3.1 logstash 1.4.2 and elasticsearch 1.2.2 Integration

2014-07-16 Thread Sandip Bankewar
Hi Mark,

I am installing on Debian.
Thanks for your help.

Regards,
Sandip Bankewar


On Tuesday, 15 July 2014 17:57:21 UTC+5:30, Mark Walkom wrote:
>
> There's a lot of stuff available out there via a google search, it does 
> depend a little on your OS and what you want to do.
>
> There's also lots of docs on the main sites - 
> www.elasticsearch.org/resources/ and http://logstash.net/
> Specifically this will be of assistance - 
> http://logstash.net/docs/1.4.2/tutorials/getting-started-with-logstash
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 15 July 2014 19:29, Sandip Bankewar > 
> wrote:
>
>> Ohh Really...
>>
>> Could you please send me steps?
>> or Is there any document?
>>
>> On Tuesday, 15 July 2014 13:04:28 UTC+5:30, Mark Walkom wrote:
>>>
>>> Yep, lots of people!
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 15 July 2014 17:21, Sandip Bankewar  wrote:
>>>
 Hi All,

 Has anyone Integration of latest Kibana 3.1 logstash 1.4.2 and 
 elasticsearch 1.2.2 Integration?

 Regards,
 Sandip Bankewar


  -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/2d079664-7b49-48e7-95ab-07386c3cf62b%
 40googlegroups.com 
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/0e0d00a1-278b-4bc6-b9b1-6836bd4854f1%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/60f1a1eb-ed36-44e2-a067-113204755e80%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Direct buffer memory problem on master Discovery

2014-07-16 Thread Pedro Jerónimo
@Mark, I'm using this cluster only for logstash, so I have 1 new index 
p/day, each index has between 30k and 100k documents. I use curator in a 
crontab to close old logs older than X days so I don't have them all loaded 
into memory at the same time. Right now I have 8 indices opened with a 
total of ~500k documents occupying ~4.5Gb of disk space. I read a bit about 
OOM with ElasticSearch and I couldn't figure out my problem yet. I followed 
ES docs memory tips:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html

And I see that mlockall is set to true on both instances.

@Ivan, yes, ES and Logstash are the only processes running on the servers. 
No, I didn't set direct memory values. Should I? If so, which sort of 
values and how do you determine those?

It's been a while since I worked with JVMs so my knowledge on Java Memory 
stuff is kind of rusty :). Thanks for the help!

On Wednesday, July 16, 2014 5:40:27 AM UTC+2, Ivan Brusic wrote:
>
> Direct memory is off heap memory. Are elasticsearch and logstash the only 
> processes on those servers? Did you set an explicit direct memory value?
>
> -- 
> Ivan
> On Jul 15, 2014 3:46 PM, "Mark Walkom"  > wrote:
>
>> How much data do you have in ES, index count and total size of all 
>> indexes?
>>
>> OutOfMemoryError means you ran out of heap, which could mean a few things.
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com 
>> web: www.campaignmonitor.com
>>
>>
>> On 15 July 2014 22:33, Pedro Jerónimo 
>> > wrote:
>>
>>> I have a cluster of 2 ES machines with a lot of indexing, not so much 
>>> searching. I'm using 2 EC2 machines with 30gb of RAM and I'm running ES on 
>>> each with 12gb heap (ES_HEAP_SIZE) and one of them (let's call it 
>>> logs1) is running logstash as well, with 2gb heap. The master node is logs1 
>>> and the other instance is logs2. I start the cluster and every things looks 
>>> fine, but after a while (1-3 days) I get the following error on logs1:
>>>
>>> [2014-07-15 12:26:39,867][WARN ][transport.netty ] [Keen Marlow] 
>>> exception caught on transport layer [[id: 0x16801a48, /XX.XX.XXX.XX:36314 
>>> => /XX.XXX.XX.XX:9300]], closing connection 
>>> java.lang.OutOfMemoryError: Direct buffer memory
>>> ...Stack Trace...
>>>
>>> And then the cluster is no longer connected and if I try to restart 
>>> logs2, I get the same error above for logs1 and this one for logs2:
>>>
>>> [2014-07-15 12:27:39,282][INFO ][discovery.ec2 ] [Betty Ross Banner] 
>>> failed to send join request to master [[Keen 
>>> Marlow][9a7FIRpBSrKQcdcV_sjSTw][ip-XX-XX-XXX-XX][inet[/XX.XX.XXX.XX:9300]]{aws_availability_zone=us-west-2a,
>>>  
>>> master=true}], reason 
>>> [org.elasticsearch.transport.RemoteTransportException: [Keen 
>>> Marlow][inet[/XX.XX.XXX.XX:9300]][discovery/zen/join]; 
>>> org.elasticsearch.transport.NodeDisconnectedException: [Betty Ross 
>>> Banner][inet[/XX.XXX.XX.XX:9300]][discovery/zen/join/validate] disconnected]
>>>
>>> Is there any memory configuration I should tune up a bit? I'm kind of 
>>> new to ElasticSearch so I'd love some help! :).
>>>
>>> Thanks!
>>>
>>> Pedro
>>>  
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com .
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/7d155e16-c8ff-40b0-ac0a-e3719e8296c3%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAEM624Yb9mhh3-E9jYvFFaY_dB8rF-R5yZCGYCHrvWNsur-MwQ%40mail.gmail.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bb843355-0c29-4ec1-803a-9149b89fc36a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch init script for centos or rhel ?

2014-07-16 Thread joergpra...@gmail.com
I recommend the service wrapper for RHEL / Centos init script.

https://github.com/elasticsearch/elasticsearch-servicewrapper

Jörg


On Wed, Jul 16, 2014 at 9:31 AM, Thomas Kuther  wrote:

>  The one from the elasticsearch CentOS rpm repository works fine here on
> EL6.
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-repositories.html
> (there are also 1.0 and 1.1 repos, simply adjust the baseurl)
>
> The source is here:
> https://github.com/elasticsearch/elasticsearch/blob/master/src/rpm/init.d/elasticsearch
> ..but I recommend the rpm from the repo because of /etc/sysconfig and
> install locations etc, much easier that way.
>
> ~Tom
>
> Am 16.07.2014 09:12, schrieb Aesop Wolf:
>
> Did you ever find a script that works on CentOS? I'm also looking for one.
>
> On Friday, March 14, 2014 9:18:04 AM UTC-7, Dominic Nicholas wrote:
>>
>> Thanks.
>> Does anyone know of a version that uses  /etc/rc.d/init.d/functions
>> instead of /lib/lsb, that would work on CentOS and work with elasticsearch
>> 1.0.1 ?
>>  Dom
>>
>> On Friday, March 14, 2014 9:24:12 AM UTC-4, David Pilato wrote:
>>>
>>>  May be this? https://github.com/elasticsearch/elasticsearch/
>>> blob/master/src/deb/init.d/elasticsearch
>>>
>>> --
>>> David ;-)
>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>>
>>>
>>> Le 14 mars 2014 à 14:19, Dominic Nicholas  a
>>> écrit :
>>>
>>>  Hi - can someone please point me to an /etc/init.d script for
>>> elasticsearch 1.0.1 for CentOS or RHEL ?
>>>
>>>  Thanks
>>>   --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/25064596-595d-4227-be37-d20f267edc5b%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>--
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a9e8017c-a565-40a6-944b-8920a591f6d6%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/53C62A62.3040100%40gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF%3D1W7bdSX2aCG0WFitjUO2JRBjLsSQnjFpYELwdOTAGw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch init script for centos or rhel ?

2014-07-16 Thread Thomas Kuther
The one from the elasticsearch CentOS rpm repository works fine here on EL6.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-repositories.html
(there are also 1.0 and 1.1 repos, simply adjust the baseurl)

The source is here:
https://github.com/elasticsearch/elasticsearch/blob/master/src/rpm/init.d/elasticsearch
..but I recommend the rpm from the repo because of /etc/sysconfig and
install locations etc, much easier that way.

~Tom

Am 16.07.2014 09:12, schrieb Aesop Wolf:
> Did you ever find a script that works on CentOS? I'm also looking for one.
>
> On Friday, March 14, 2014 9:18:04 AM UTC-7, Dominic Nicholas wrote:
>
> Thanks. 
> Does anyone know of a version that
> uses  /etc/rc.d/init.d/functions instead of /lib/lsb, that would
> work on CentOS and work with elasticsearch 1.0.1 ?
> Dom
>
> On Friday, March 14, 2014 9:24:12 AM UTC-4, David Pilato wrote:
>
> May be
> this? 
> https://github.com/elasticsearch/elasticsearch/blob/master/src/deb/init.d/elasticsearch
> 
> 
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 14 mars 2014 à 14:19, Dominic Nicholas
>  a écrit :
>
> Hi - can someone please point me to an /etc/init.d script for
> elasticsearch 1.0.1 for CentOS or RHEL ?
>
> Thanks
> -- 
> You received this message because you are subscribed to the
> Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to elasticsearc...@googlegroups.com.
> To view this discussion on the web visit
> 
> https://groups.google.com/d/msgid/elasticsearch/25064596-595d-4227-be37-d20f267edc5b%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout
> .
>
> -- 
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearch+unsubscr...@googlegroups.com
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a9e8017c-a565-40a6-944b-8920a591f6d6%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53C62A62.3040100%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Template mapping success, filtering with it failed.

2014-07-16 Thread t goto
Hi, 

I've successfuly create a template to map a specific timestamp from log 
like below.
{
  "tempalte_blahblah" : {
"template" : "logstash*",
"mappings" : {
  "blah" : {
"properties" : {
"hostname": { "type": "string", "index":"not_analyzed" },
"time_raw": { "type": "date", "index": "analyzed", "format": 
"-MM-dd 
HH:mm:ss.SSS" }
  }
}
  }

}

And log looks like this..
2014-07-14 13:02:32.128 25121 (host) (COMMAND) (message)
2014-07-14 13:02:32.133 25121 (host) (COMMAND) (message)

Now, I can see "time_raw" from Kibana or API query :)
But when I use "time_raw" as timefield for Timepicker in Kibana, nothing 
hits.
I tried query with range for the "time_raw" in epoch, nothing hits too.
curl -XGET 'http://localhost:9200/logstash-2014.07.16/_search?pretty' -d '{
 "query":{
   "bool":{
 "must":[
   {
 "range":{
   "time_raw": {
 "from": 1405490989000,
 "to": 1405491289000
   }
 }
   }
 ]
   }
 }
}'


Somehow I tried query with range for the "time_raw" in date+milliseconds, 
now it hits.
curl -XGET 'http://localhost:9200/logstash-2014.07.16/_search?pretty' -d '{
 "query":{
   "bool":{
 "must":[
   {
 "range":{
   "time_raw": {
 "from": "2014-7-16 15:09:49.000",
 "to": "2014-7-16 15:14:49.000"
   }
 }
   }
 ]
   }
 }
}'

Since I'm heavily relying on Kibana, I need to use epoch time. (Kibana uses 
epoch time right?)
Did I misconfigure something here?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79731080-b8bd-4f29-9bd0-e30165336967%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch dies every other day

2014-07-16 Thread Klavs Klavsen
I updated the graph:
http://blog.klavsen.info/ES-graphs-update.png

I added overview of how many threads were running, and its appearent that 
what peaked when it crashed (left side of the two graphs - two spikes where 
it crashed) - correlated with a peak in search threads.
Also the change to G1GC for two of the nodes - is very appearent in the 
heap_mem_usage graph :)

It's been stable for two days now.. nearing the record :) I did also move a 
"culprit" who searched the main index for larger periods, to their own 
index.. and change threadpool.index.queue_size: from -1 to 900.
the index queue size does not seem to be hit at all, so I'm not sure that 
made a change.

Thank you for your input everyone.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dd7fac5b-e97f-4abc-94e4-23e66efe5c36%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch init script for centos or rhel ?

2014-07-16 Thread Aesop Wolf
Did you ever find a script that works on CentOS? I'm also looking for one.

On Friday, March 14, 2014 9:18:04 AM UTC-7, Dominic Nicholas wrote:
>
> Thanks. 
> Does anyone know of a version that uses  /etc/rc.d/init.d/functions 
> instead of /lib/lsb, that would work on CentOS and work with elasticsearch 
> 1.0.1 ?
> Dom
>
> On Friday, March 14, 2014 9:24:12 AM UTC-4, David Pilato wrote:
>>
>> May be this? 
>> https://github.com/elasticsearch/elasticsearch/blob/master/src/deb/init.d/elasticsearch
>>
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>
>>
>> Le 14 mars 2014 à 14:19, Dominic Nicholas  a 
>> écrit :
>>
>> Hi - can someone please point me to an /etc/init.d script for 
>> elasticsearch 1.0.1 for CentOS or RHEL ?
>>
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/25064596-595d-4227-be37-d20f267edc5b%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a9e8017c-a565-40a6-944b-8920a591f6d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Garbage collection pauses causing cluster to get unresponsive

2014-07-16 Thread Srinath C
Hi Joe/Michael,
   I tried all your suggestions and found a remarkable difference in the
way elasticsearch is able to handle the bulk indexing.
   Right now, I'm able to ingest at the rate of 25K per second with the
same setup. But occasionally there are still some
EsRejectedExecutionException being raised. The CPUUtilization on the
elasticsearch nodes is so low (around 200% on an 8 core system) that it
seems that something else is wrong. I have also tried to increase
queue_size but it just delays the EsRejectedExecutionException.

Any more suggestions on how to handle this?

*Current setup*: 4 c3.2xlarge instances of ES 1.2.2.
*Current Configurations*:
index.codec.bloom.load: false
index.compound_format: false
index.compound_on_flush: false
index.merge.policy.max_merge_at_once: 4
index.merge.policy.max_merge_at_once_explicit: 4
index.merge.policy.max_merged_segment: 1gb
index.merge.policy.segments_per_tier: 4
index.merge.policy.type: tiered
index.merge.scheduler.max_thread_count: 4
index.merge.scheduler.type: concurrent
index.refresh_interval: 10s
index.translog.flush_threshold_ops: 5
index.translog.interval: 10s
index.warmer.enabled: false
indices.memory.index_buffer_size: 50%
indices.store.throttle.type: none





On Tue, Jul 15, 2014 at 6:24 PM, Srinath C  wrote:

> Thanks Joe, Michael and all. Really appreciate you help.
> I'll try out as per your suggestions and run the tests. Will post back on
> my progress.
>
>
>
> On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless <
> m...@elasticsearch.com> wrote:
>
>> First off, upgrade ES to the latest (1.2.2) release; there have been a
>> number of bulk indexing improvements since 1.1.
>>
>> Second, disable merge IO throttling.
>>
>> Third, use the default settings, but increase index.refresh_interval to
>> perhaps 5s, and set index.translog.flush_threshold_ops to maybe 5: this
>> decreases the frequency of Lucene level commits (= filesystem fsyncs).
>>
>> If possible, use SSDs: they are much faster for merging.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Mon, Jul 14, 2014 at 11:03 PM, Srinath C  wrote:
>>
>>>
>>> Each document is around 300 bytes on average so that bring up the data
>>> rate to around 17Mb per sec.
>>> This is running on ES version 1.1.1. I have been trying out different
>>> values for these configurations. queue_size was increased when I got
>>> EsRejectedException due to queue going full (default size of 50).
>>> segments_per_tier was picked up from some articles on scaling. What would
>>> be a reasonable value based on my data rate?
>>>
>>> If 60K seems to be too high are there any benchmarks available for
>>> ElasticSearch?
>>>
>>> Thanks all for your replies.
>>>
>>>
>>>
>>> On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:
>>>
 index.merge.policy.segments_per_tier: 100 and threadpool.bulk.queue_
 size: 500 are extreme settings that should be avoided as they allocate
 much resources. What you see by UnavailbleShardException / NoNodes is
 congestion because of such extreme values.

 What ES version is this? Why don't you use the default settings?

 Jörg


 On Mon, Jul 14, 2014 at 4:46 AM, Srinath C  wrote:

> Hi,
>I'm having a tough time to keep ElasticSearch running healthily for
> even 20-30 mins in my setup. At an indexing rate of 28-36K per second, the
> CPU utilization soon drops to 100% and never recovers. All client requests
> fail with UnavailbleShardException or "No Nodes" exception. The logs show
> warnings from "monitor.jvm" saying that GC did not free up much of memory.
>
>  The ultimate requirement is to import data into the ES cluster at
> around 60K per second on a setup explained below. The only operation being
> performed is bulk import of documents. Soon the ES nodes become
> unresponsive and the CPU utilization drops to 100% (from 400-500%). They
> don't seem to recover even after the bulk import operations are ceased.
>
>   Any suggestions on how to tune the GC based on my requirements? What
> other information would be needed to look into this?
>
> Thanks,
> Srinath.
>
>
> The setup:
>   - Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
>   - Load: The only operation during this test is bulk import of data.
> The documents are small around the size of ~200-500 bytes and are being
> bulk imported into the cluster using storm.
>   - Bulk Import: A total of 7-9 storm workers using a single
> BulkProcessor each to import data into the ES cluster. As seen from the
> logs, each of the worker processes are importing around 4K docs per second
> from each worker i.e. around 28-36K docs per second getting imported into
> ES.
>   - JVM Args: Around 8G of heap, tried with CMS collector as well as
> G1 collector
>   - ES configuration:
>  - "mlockall": true
>  - "threadpool