Re: How exactly works "max_expansions" in match_phrase_prefix query?

2014-06-04 Thread shgeorge
In the above example if there are documents with terms :  
"test","tester","testing","tests" 
 and we are querying for "test" and "max_expansions" : 2, should it return
only first 2 matching docs?

I see that it is returning all the matching docs. Could you please explain?




--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/How-exactly-works-max-expansions-in-match-phrase-prefix-query-tp4030146p4057104.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1401950949168-4057104.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana 3: display the number of items in a Text panel?

2014-06-04 Thread Itamar Syn-Hershko
number of lines where? you can always show a Count facet that will count
the number of results of a query

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Wed, Jun 4, 2014 at 12:10 PM, Nitsan Seniak  wrote:

> Hello,
>
> In a Kibana 3 Text panel, is it possible to display the number of lines?
>
> My use case is that I want to display a table with unique IDs extracted
> from log entries, and display the number of these unique IDs.
>
> Thanks,
>
> -- Nitsan
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/79d81d20-dc99-45b7-9480-9318e083%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZvFuYc7VF1cReYNi-pqfRzSaoRB-3tC4pg7g5HE0GHVUA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shard count and plugin questions

2014-06-04 Thread Mark Walkom
I haven't heard of a limit to the number of indexes, obviously the more you
have the larger the cluster state that needs to be maintained.

You might want to look into routing (
http://exploringelasticsearch.com/advanced_techniques.html or
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-routing-field.html)
as an alternative to optimise and minimise index count.
You can also always hedge your bets and create an index with a larger
number of shards, ie not a 1:1, shard:node relationship, and then move the
excess shards to new nodes as they are added.

I'd be interested to see how you could measure how you'd outgrow an index
though, technically it can just keep growing until the node can no longer
deal with it. This is something that testing is good for, throw data at a
single shard index and then when it falls over you have an indicator of how
your hardware will handle things.

As for reading the transaction log and searching it, you might be playing a
losing game as your code to parse and search would have to be super quick
to make worth doing.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 5 June 2014 15:33, Todd Nine  wrote:

> Thanks for the answers Mark.  See inline.
>
>
> On Wed, Jun 4, 2014 at 3:51 PM, Mark Walkom 
> wrote:
>
>> 1) The answer is - it depends. You want to setup a test system with
>> indicative specs, and then throw some sample data at it until things start
>> to break. However this may help
>> https://www.found.no/foundation/sizing-elasticsearch/
>>
>
> This is what I was expecting.  Thanks for the pointer to the
> documentation.  We're going to have some pretty beefy clusters (SSDs Raid
> 0, 8 to 16 cores and a lot of RAM) to power ES.  We're going to have a LOT
> of indexes, we would be operating this as a core infrastructure service.
>  Is there an upper limit on the amount of indexes a cluster can hold?
>
>
>> 2) https://github.com/jprante/elasticsearch-knapsack might do what you
>> want.
>>
>
> This won't quite work for us.  We can't have any down time, so it seems
> like an A/B system is more appropriate.  What we're currently thinking is
> the following.
>
> Each index has 2 aliases, a read and a write alias.
>
> 1) Both read and write aliases point to an initial index. Say shard count
> 5 replication 2 (ES is not our canonical data source, so we're ok with
> reconstructing search data)
>
> 2) We detect via monitoring we're going to outgrow an index. We create a
> new index with more shards, and potentially a higher replication depending
> on read load.  We then update the write alias to point to both the old and
> new index.  All clients will then being dual writes to both indexes.
>
> 3) While we're writing to old and new, some process (maybe a river?) will
> begin copying documents updated < the write alias time from the old index
> to the new index.  Ideally, it would be nice if each replica could copy
> only it's local documents into the new index.  We'll want to throttle this
> as well.  Each node will need additional operational capacity
> to accommodate the dual writes as well as accepting the write of the "old"
> documents.  I'm concerned if we push this through too fast, we could cause
> interruptions of service.
>
>
> 4) Once the copy is completed, the read index is moved to the new index,
> then the old index is removed from the system.
>
> Could such a process be implemented as a plugin?  If the work can happen
> in parallel across all nodes containing a shard we can increase the
> process's speed dramatically.  If we have a single worker, like a river, it
> might possibly take too long.
>
>
> 3) How real time is real time? You can change index.refresh_interval to
>> something small so that window of "unflushed" items is minimal, but that
>> will have other impacts.
>>
>
> Once the index call returns to the caller, it would be immediately
> available for query.  We're tried lowering the refresh rate, this results
> is a pretty significant drop in throughput.  To meet our throughput
> requirements, we're considering even turning it up to 5 or 15 seconds.  If
> we can then search this data that's in our commit log (via storing it in
> memory until flush) that would be ideal.
>
> Thoughts?
>
>
>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 5 June 2014 04:18, Todd Nine  wrote:
>>
>>>  Hi All,
>>>   We've been using elastic search as our search index for our new
>>> persistence implementation.
>>>
>>> https://usergrid.incubator.apache.org/
>>>
>>> I have a few questions I could use a hand with.
>>>
>>> 1) Is there any good documentation on the upper limit to count of
>>> documents, or total index size, before you need to allocate more shards?
>>>  Do shards have a real world limit on size or number of entries to keep
>>> response times low?  Every s

Re: Java search issue

2014-06-04 Thread Sunny Cal
lower case helped.. could have never thought of that. :)

On Wednesday, June 4, 2014 10:31:22 PM UTC-7, David Pilato wrote:
>
> Try with high lowercase or use a match query which is analyzed.
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 5 juin 2014 à 03:49, Sunny Cal > a 
> écrit :
>
> I am running into a strange issue.
> I have created a index "alerts" and putting "alert" objects in it.
> When I do 
> curl -XGET 'http://localhost:9200/twitter/_search?q=severity:HIGH'
> I get the alert objects as output.
> I also get correct results when I go to HEAD and execute the query 
> {"query":{"term":{"severity":"HIGH"}}}
>
> The results are like:
>
> {"_index":"alerts","_type":"alert","_id":"_zR5BTp7QLCpt0Dh2-7cxA","_score":1.7461716,"_source":{"alertId":3,"alertName":"text
>  3","createdOn":1401930512641,"severity":"HIGH"}}
>
>
>
> But when I use java to connect and get the results I get no results
>
> Code is:
> Node node = nodeBuilder().clusterName("elasticsearch").node();
> Client client = node.client();
> SearchRequestBuilder srb = client
> .prepareSearch("alerts")
> .setTypes("alert")
> .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
> .setQuery(QueryBuilders.termQuery("severity", "HIGH")) ;
> System.out.println("Sending:" + srb.toString());
> SearchResponse response = srb.execute().actionGet();
> System.out.println("Searched");
> System.out.println("GOT ROWS:" + response.toString());
>
>
> Output is:
>
> Sending:{
>   "query" : {
> "term" : {
>   "severity" : "HIGH"
> }
>   }
> }
> Searched
> GOT ROWS:{
>   "took" : 50,
>   "timed_out" : false,
>   "_shards" : {
> "total" : 5,
> "successful" : 5,
> "failed" : 0
>   },
>   "hits" : {
> "total" : 0,
> "max_score" : null,
> "hits" : [ ]
>   }
> }
>
>
> Can anybody help.. I have tried a lot of things but it is not working.
> I am using jdk1.7.0_40.. if that makes any difference
>
> Thanks
> C
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/7f15b62d-951e-48af-9acc-e9b4b10149e5%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dce1cb71-bd6e-4b86-ba30-374d954c1af0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shard count and plugin questions

2014-06-04 Thread Todd Nine
Thanks for the answers Mark.  See inline.


On Wed, Jun 4, 2014 at 3:51 PM, Mark Walkom 
wrote:

> 1) The answer is - it depends. You want to setup a test system with
> indicative specs, and then throw some sample data at it until things start
> to break. However this may help
> https://www.found.no/foundation/sizing-elasticsearch/
>

This is what I was expecting.  Thanks for the pointer to the documentation.
 We're going to have some pretty beefy clusters (SSDs Raid 0, 8 to 16 cores
and a lot of RAM) to power ES.  We're going to have a LOT of indexes, we
would be operating this as a core infrastructure service.  Is there an
upper limit on the amount of indexes a cluster can hold?


> 2) https://github.com/jprante/elasticsearch-knapsack might do what you
> want.
>

This won't quite work for us.  We can't have any down time, so it seems
like an A/B system is more appropriate.  What we're currently thinking is
the following.

Each index has 2 aliases, a read and a write alias.

1) Both read and write aliases point to an initial index. Say shard count 5
replication 2 (ES is not our canonical data source, so we're ok with
reconstructing search data)

2) We detect via monitoring we're going to outgrow an index. We create a
new index with more shards, and potentially a higher replication depending
on read load.  We then update the write alias to point to both the old and
new index.  All clients will then being dual writes to both indexes.

3) While we're writing to old and new, some process (maybe a river?) will
begin copying documents updated < the write alias time from the old index
to the new index.  Ideally, it would be nice if each replica could copy
only it's local documents into the new index.  We'll want to throttle this
as well.  Each node will need additional operational capacity
to accommodate the dual writes as well as accepting the write of the "old"
documents.  I'm concerned if we push this through too fast, we could cause
interruptions of service.


4) Once the copy is completed, the read index is moved to the new index,
then the old index is removed from the system.

Could such a process be implemented as a plugin?  If the work can happen in
parallel across all nodes containing a shard we can increase the process's
speed dramatically.  If we have a single worker, like a river, it might
possibly take too long.


3) How real time is real time? You can change index.refresh_interval to
> something small so that window of "unflushed" items is minimal, but that
> will have other impacts.
>

Once the index call returns to the caller, it would be immediately
available for query.  We're tried lowering the refresh rate, this results
is a pretty significant drop in throughput.  To meet our throughput
requirements, we're considering even turning it up to 5 or 15 seconds.  If
we can then search this data that's in our commit log (via storing it in
memory until flush) that would be ideal.

Thoughts?



> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 5 June 2014 04:18, Todd Nine  wrote:
>
>> Hi All,
>>   We've been using elastic search as our search index for our new
>> persistence implementation.
>>
>> https://usergrid.incubator.apache.org/
>>
>> I have a few questions I could use a hand with.
>>
>> 1) Is there any good documentation on the upper limit to count of
>> documents, or total index size, before you need to allocate more shards?
>>  Do shards have a real world limit on size or number of entries to keep
>> response times low?  Every system has it's limits, and I'm trying to find
>> some actual data on the size limits.  I've been trolling Google for some
>> answers, but I haven't really found any good test results.
>>
>>
>> 2) Currently, it's not possible to increase the shard count for an index.
>> The workaround is to create a new index with a higher count, and move
>> documents from the old index into the new.  Could this be accomplished via
>> a plugin?
>>
>>
>> 3) We sometimes have "realtime" requirements.  In that when an index call
>> is returned, it is available.  Flushing explicitly is not a good idea from
>> a performance perspective.Has anyone explored searching in memory the
>> documents that have not yet been flushed and merging them with the Lucene
>> results?  Is this something that's feasible to be implemented via a plugin?
>>
>> Thanks in advance!
>> Todd
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/940c6404-6667-4846-b457-977e705d3797%40googlegroups.com
>> 

Re: Java search issue

2014-06-04 Thread David Pilato
Try with high lowercase or use a match query which is analyzed.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 5 juin 2014 à 03:49, Sunny Cal  a écrit :

I am running into a strange issue.
I have created a index "alerts" and putting "alert" objects in it.
When I do 
curl -XGET 'http://localhost:9200/twitter/_search?q=severity:HIGH'
I get the alert objects as output.
I also get correct results when I go to HEAD and execute the query 
{"query":{"term":{"severity":"HIGH"}}}

The results are like:
{"_index":"alerts","_type":"alert","_id":"_zR5BTp7QLCpt0Dh2-7cxA","_score":1.7461716,"_source":{"alertId":3,"alertName":"text
 3","createdOn":1401930512641,"severity":"HIGH"}}


But when I use java to connect and get the results I get no results

Code is:
Node node = nodeBuilder().clusterName("elasticsearch").node();
Client client = node.client();
SearchRequestBuilder srb = client
.prepareSearch("alerts")
.setTypes("alert")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.termQuery("severity", "HIGH")) ;
System.out.println("Sending:" + srb.toString());
SearchResponse response = srb.execute().actionGet();
System.out.println("Searched");
System.out.println("GOT ROWS:" + response.toString());


Output is:

Sending:{
  "query" : {
"term" : {
  "severity" : "HIGH"
}
  }
}
Searched
GOT ROWS:{
  "took" : 50,
  "timed_out" : false,
  "_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
  },
  "hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
  }
}


Can anybody help.. I have tried a lot of things but it is not working.
I am using jdk1.7.0_40.. if that makes any difference

Thanks
C
-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f15b62d-951e-48af-9acc-e9b4b10149e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/21B22A69-AFF3-46FC-867E-F836F4E1751D%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: cross data center replication

2014-06-04 Thread Todd Nine
Hey all,
 
Sorry to resurrect a dead thread.  Did you ever find a solution for 
eventual consistency of documents across EC2 regions?

Thanks,
todd



On Wednesday, May 1, 2013 5:50:00 AM UTC-7, Norberto Meijome wrote:
>
> +1 on all of the above. es-reindex already in my list of things to 
> investigate (for a number of issues...)
>
> cheers,
> b 
>
>
> On Wed, May 1, 2013 at 6:58 AM, Paul Hill 
> > wrote:
>
>> On 4/23/2013 8:44 AM, Daniel Maher wrote:
>>
>>> On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:
>>>
 Hello Folks,
 [...] does ES out of the box currently support cross data
 center replication,  []

>>>
>>> Hello,
>>>
>>> I'd wager that the question you're really asking about is how to control 
>>> where shards are placed; if you can make deterministic statements about 
>>> where shards are, then you can create your own "rack-aware" or "data 
>>> centre-aware" scenarios.  ES has supported this "out of the box" for well 
>>> over a year now (possibly longer).
>>>
>>> You'll want to investigate "zones" and "routing allocation", which are 
>>> the key elements of shard placement.  There is an excellent blog post which 
>>> describes exactly how to set things up here :
>>> http://blog.sematext.com/2012/05/29/elasticsearch-shard-
>>> placement-control/ 
>>>
>>>  Is shard allocation really the correct solution if the data centers are 
>> globally distributed?
>>
>> If I have a data center in the US intended to server data from the US, 
>> but it should also have access to Europe and Asia data, and clusters in 
>> both Europe and Asia with similar needs, would I really want to use zones 
>> etc. and have one great global cluster with data center aware 
>> configurations?
>>
>> Assuming that the US would be happy to deal with old documents from Asia 
>> and Europe, when Asia or Europe is off line or just not caught up, it would 
>> seem that you would NOT want a "world" cluster, because I can't picture how 
>> you'd configure a 3-part world cluster for both index into the right 
>> indices, search the right (possible combination of) shards, but also 
>> preventing "split brain".
>>
>> In the scenerio, I've described, I would think each data center might 
>> better provide availability and eventual consistency (with less concern for 
>> the remote data from the other region) by having three clusters and some 
>> type of syncing from one index to copies at the other two locations.  For 
>> example, the US datacenter might have a US, copyOfEurope, and copyOfAsia 
>> index.
>>
>> Anyone have any observations about such a world-wide scenerio?
>> Are there any index to index copy utilities?
>> Is there a river or other plugin that might be useful for this three 
>> clusters working together scenerio?
>> How about the project https://github.com/karussell/elasticsearch-reindex?
>> Comments?
>>
>> -Paul
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>
>
> -- 
> Norberto 'Beto' Meijome
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/646067d1-1137-4777-be51-ced0bd6a3edd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Marvel/Sense Troubleshooting

2014-06-04 Thread Mark Walkom
It doesn't look like it's loaded - [Zombie] loaded [], sites []

How did you install ES? It could be that it's not looking in the right path
for the plugin.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 5 June 2014 13:12, Fergie  wrote:

>
> Mark -
> There must be something I don't understand.  This is my log file.
>
> I don't see it but the product said it installed ok?
>
> Doug
>
> [2014-06-04 09:28:30,194][INFO ][node ] [Gideon Mace]
> stopping ...
> [2014-06-04 09:28:30,320][INFO ][node ] [Gideon Mace]
> stopped
> [2014-06-04 09:28:30,321][INFO ][node ] [Gideon Mace]
> closing ...
> [2014-06-04 09:28:30,336][INFO ][node ] [Gideon Mace]
> closed
> [2014-06-04 09:28:33,637][INFO ][node ] [Black Widow]
> version[1.1.0], pid[17289], build[2181e11/2014-03-25T15:59:51Z]
> [2014-06-04 09:28:33,638][INFO ][node ] [Black Widow]
> initializing ...
> [2014-06-04 09:28:33,658][INFO ][plugins  ] [Black Widow]
> loaded [], sites []
> [2014-06-04 09:28:36,731][INFO ][node ] [Black Widow]
> initialized
> [2014-06-04 09:28:36,731][INFO ][node ] [Black Widow]
> starting ...
> [2014-06-04 09:28:36,869][INFO ][transport] [Black Widow]
> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
> 192.168.1.4:9300]}
> [2014-06-04 09:28:39,924][INFO ][cluster.service  ] [Black Widow]
> new_master [Black
> Widow][ZCWTbyEdSOuTXJiJP96apQ][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9300]],
> reason: zen-disco-join (elected_as_master)
> [2014-06-04 09:28:39,950][INFO ][discovery] [Black Widow]
> elasticsearch/ZCWTbyEdSOuTXJiJP96apQ
> [2014-06-04 09:28:39,986][INFO ][http ] [Black Widow]
> bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
> 192.168.1.4:9200]}
> [2014-06-04 09:28:40,994][INFO ][gateway  ] [Black Widow]
> recovered [2] indices into cluster_state
> [2014-06-04 09:28:40,997][INFO ][node ] [Black Widow]
> started
> [2014-06-04 11:16:23,939][INFO ][cluster.service  ] [Black Widow]
> added
> {[Aminedi][yS1Gg5pkREChpKTQhQnGrw][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
> data=false},}, reason: zen-disco-receive(join from
> node[[Aminedi][yS1Gg5pkREChpKTQhQnGrw][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
> data=false}])
> [2014-06-04 11:25:48,867][INFO ][cluster.service  ] [Black Widow]
> removed
> {[Aminedi][yS1Gg5pkREChpKTQhQnGrw][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
> data=false},}, reason:
> zen-disco-node_failed([Aminedi][yS1Gg5pkREChpKTQhQnGrw][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
> data=false}), reason transport disconnected (with verified connect)
> [2014-06-04 12:07:32,186][INFO ][cluster.service  ] [Black Widow]
> added {[Ursa
> Major][WgRv_tCgQ7CkmLqEPcaxeQ][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
> data=false},}, reason: zen-disco-receive(join from node[[Ursa
> Major][WgRv_tCgQ7CkmLqEPcaxeQ][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
> data=false}])
> [2014-06-04 12:07:47,077][INFO ][cluster.service  ] [Black Widow]
> removed {[Ursa
> Major][WgRv_tCgQ7CkmLqEPcaxeQ][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
> data=false},}, reason: zen-disco-node_failed([Ursa
> Major][WgRv_tCgQ7CkmLqEPcaxeQ][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
> data=false}), reason transport disconnected (with verified connect)
> [2014-06-04 12:07:59,624][INFO ][node ] [Black Widow]
> stopping ...
> [2014-06-04 12:07:59,729][INFO ][node ] [Black Widow]
> stopped
> [2014-06-04 12:07:59,729][INFO ][node ] [Black Widow]
> closing ...
> [2014-06-04 12:07:59,739][INFO ][node ] [Black Widow]
> closed
> [2014-06-04 12:08:02,746][INFO ][node ] [Jaeger]
> version[1.1.0], pid[19166], build[2181e11/2014-03-25T15:59:51Z]
> [2014-06-04 12:08:02,746][INFO ][node ] [Jaeger]
> initializing ...
> [2014-06-04 12:08:02,765][INFO ][plugins  ] [Jaeger]
> loaded [], sites []
> [2014-06-04 12:08:05,878][INFO ][node ] [Jaeger]
> initialized
> [2014-06-04 12:08:05,879][INFO ][node ] [Jaeger]
> starting ...
> [2014-06-04 12:08:06,036][INFO ][transport] [Jaeger]
> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
> 192.168.1.4:9300]}
> [2014-06-04 12:08:09,102][INFO ][cluster.service  ] [Jaeger]
> new_master
> [Jaeger][Ixh9uguMQsSS6ptnGcFoUQ][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9300]],
> reason: zen-disco-join (elected_as_master)

Re: Marvel/Sense Troubleshooting

2014-06-04 Thread Fergie

Mark -
There must be something I don't understand.  This is my log file.

I don't see it but the product said it installed ok?

Doug

[2014-06-04 09:28:30,194][INFO ][node ] [Gideon Mace] 
stopping ...
[2014-06-04 09:28:30,320][INFO ][node ] [Gideon Mace] 
stopped
[2014-06-04 09:28:30,321][INFO ][node ] [Gideon Mace] 
closing ...
[2014-06-04 09:28:30,336][INFO ][node ] [Gideon Mace] 
closed
[2014-06-04 09:28:33,637][INFO ][node ] [Black Widow] 
version[1.1.0], pid[17289], build[2181e11/2014-03-25T15:59:51Z]
[2014-06-04 09:28:33,638][INFO ][node ] [Black Widow] 
initializing ...
[2014-06-04 09:28:33,658][INFO ][plugins  ] [Black Widow] 
loaded [], sites []
[2014-06-04 09:28:36,731][INFO ][node ] [Black Widow] 
initialized
[2014-06-04 09:28:36,731][INFO ][node ] [Black Widow] 
starting ...
[2014-06-04 09:28:36,869][INFO ][transport] [Black Widow] 
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address 
{inet[/192.168.1.4:9300]}
[2014-06-04 09:28:39,924][INFO ][cluster.service  ] [Black Widow] 
new_master [Black 
Widow][ZCWTbyEdSOuTXJiJP96apQ][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9300]],
 
reason: zen-disco-join (elected_as_master)
[2014-06-04 09:28:39,950][INFO ][discovery] [Black Widow] 
elasticsearch/ZCWTbyEdSOuTXJiJP96apQ
[2014-06-04 09:28:39,986][INFO ][http ] [Black Widow] 
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address 
{inet[/192.168.1.4:9200]}
[2014-06-04 09:28:40,994][INFO ][gateway  ] [Black Widow] 
recovered [2] indices into cluster_state
[2014-06-04 09:28:40,997][INFO ][node ] [Black Widow] 
started
[2014-06-04 11:16:23,939][INFO ][cluster.service  ] [Black Widow] 
added 
{[Aminedi][yS1Gg5pkREChpKTQhQnGrw][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
 
data=false},}, reason: zen-disco-receive(join from 
node[[Aminedi][yS1Gg5pkREChpKTQhQnGrw][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
 
data=false}])
[2014-06-04 11:25:48,867][INFO ][cluster.service  ] [Black Widow] 
removed 
{[Aminedi][yS1Gg5pkREChpKTQhQnGrw][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
 
data=false},}, reason: 
zen-disco-node_failed([Aminedi][yS1Gg5pkREChpKTQhQnGrw][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
 
data=false}), reason transport disconnected (with verified connect)
[2014-06-04 12:07:32,186][INFO ][cluster.service  ] [Black Widow] 
added {[Ursa 
Major][WgRv_tCgQ7CkmLqEPcaxeQ][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
 
data=false},}, reason: zen-disco-receive(join from node[[Ursa 
Major][WgRv_tCgQ7CkmLqEPcaxeQ][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
 
data=false}])
[2014-06-04 12:07:47,077][INFO ][cluster.service  ] [Black Widow] 
removed {[Ursa 
Major][WgRv_tCgQ7CkmLqEPcaxeQ][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
 
data=false},}, reason: zen-disco-node_failed([Ursa 
Major][WgRv_tCgQ7CkmLqEPcaxeQ][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9301]]{client=true,
 
data=false}), reason transport disconnected (with verified connect)
[2014-06-04 12:07:59,624][INFO ][node ] [Black Widow] 
stopping ...
[2014-06-04 12:07:59,729][INFO ][node ] [Black Widow] 
stopped
[2014-06-04 12:07:59,729][INFO ][node ] [Black Widow] 
closing ...
[2014-06-04 12:07:59,739][INFO ][node ] [Black Widow] 
closed
[2014-06-04 12:08:02,746][INFO ][node ] [Jaeger] 
version[1.1.0], pid[19166], build[2181e11/2014-03-25T15:59:51Z]
[2014-06-04 12:08:02,746][INFO ][node ] [Jaeger] 
initializing ...
[2014-06-04 12:08:02,765][INFO ][plugins  ] [Jaeger] loaded 
[], sites []
[2014-06-04 12:08:05,878][INFO ][node ] [Jaeger] 
initialized
[2014-06-04 12:08:05,879][INFO ][node ] [Jaeger] 
starting ...
[2014-06-04 12:08:06,036][INFO ][transport] [Jaeger] 
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address 
{inet[/192.168.1.4:9300]}
[2014-06-04 12:08:09,102][INFO ][cluster.service  ] [Jaeger] 
new_master 
[Jaeger][Ixh9uguMQsSS6ptnGcFoUQ][Douglass-MacBook-Pro.local][inet[/192.168.1.4:9300]],
 
reason: zen-disco-join (elected_as_master)
[2014-06-04 12:08:09,132][INFO ][discovery] [Jaeger] 
elasticsearch/Ixh9uguMQsSS6ptnGcFoUQ
[2014-06-04 12:08:09,194][INFO ][http ] [Jaeger] 
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address 
{inet[/192.168.1.4:9200]}
[2014-06-04 12:08:10,388][INFO ][gateway  ] [Jaeger] 
recovered [2] indices into cluster_state
[2014-06-04 12:08:10,390][INFO ][node ] [Jae

Re: Out of memory, missing shards, looks like split-brain

2014-06-04 Thread Quan Tong Anh
I would like to know:
- What is the root cause?
- How do I fix that?
- If it's memory problem? Is there anything that I can do (except for 
upgrade)?

On Jun 5, 2014, at 9:54 AM, Mark Walkom  wrote:

> What do you want to know exactly?
> 
> Regards,
> Mark Walkom
> 
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
> 
> 
> On 5 June 2014 12:40, Quan Tong Anh  wrote:
> I'm running a 3-node cluster with 2 data nodes. My configuration:
> 
> es1, es2:
> 
> node:
>   name: elasticsearch-1
>   master: true
>   data: true
> 
> 
> discovery:
>   zen:ping:
>   multicast:
> enabled: false
>   unicast:
> hosts: 
> ["elasticsearch-1.domain.com:9300","logs.domain.com:9300","elasticsearch-2.domain.com:9300",]
> 
> 
> 
> gl2:
> 
> node:
>   name: graylog2
>   master: false
>   data: false
> 
> 
> 
> Shinken has sent me a notification that said there is only 2 nodes in cluster:
> 
> {
>   "cluster_name" : "domain.com",
>   "status" : "red",
>   "timed_out" : false,
>   "number_of_nodes" : 2,
>   "number_of_data_nodes" : 1,
>   "active_primary_shards" : 12,
>   "active_shards" : 12,
>   "relocating_shards" : 0,
>   "initializing_shards" : 0,
>   "unassigned_shards" : 12
> }
> 
> 
> Log on the ES-1:
> 
> [2014-06-04 15:51:09,281][WARN ][transport] [elasticsearch-1] 
> Received response for a request that has timed out, sent [61627ms] ago, timed 
> ou
> t [30338ms] ago, action [discovery/zen/fd/masterPing], node 
> [[elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=true}],
>  id [272380]
> [2014-06-04 15:51:50,542][WARN ][index.cache.field.data.resident] 
> [elasticsearch-1] [graylog2-graylog2_2] loading field [_date ] caused out of 
> memory failure
> java.lang.OutOfMemoryError: Java heap space
> [2014-06-04 15:55:16,351][DEBUG][action.admin.indices.stats] 
> [elasticsearch-1] [graylog2-graylog2_5][2], node[Vcvb6dtMQf-nfuB-wR9iew], 
> [P], s[STARTED]: Failed
>  to execute 
> [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@7631d2a2]
> org.elasticsearch.transport.RemoteTransportException: 
> [elasticsearch-2][inet[/107.170.x.y:9300]][indices/stats/s]
> Caused by: org.elasticsearch.index.IndexShardMissingException: 
> [graylog2-graylog2_5][2] missing
> at 
> org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:179)
> at 
> org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
> at 
> org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
> at 
> org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:398)
> at 
> org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:384)
> at 
> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> [2014-06-04 15:56:29,504][WARN ][index.engine.robin   ] [elasticsearch-1] 
> [graylog2_recent][0] failed engine
> java.lang.OutOfMemoryError: Java heap space
> 
> 
> 
> Log on the ES-2:
> 
> [2014-06-04 15:51:02,276][WARN ][transport.netty  ] [elasticsearch-2] 
> exception caught on transport layer [[id: 0x72906b9d, /107.170.z.t:52899 => /1
> 07.170.x.y:9300]], closing connection
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.DirectByteBuffer.duplicate(DirectByteBuffer.java:217)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:87)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:190)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:150)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
> at 
> org.elasticsearch.com

Java search issue

2014-06-04 Thread Sunny Cal
I am running into a strange issue.
I have created a index "alerts" and putting "alert" objects in it.
When I do 
curl -XGET 'http://localhost:9200/twitter/_search?q=severity:HIGH'
I get the alert objects as output.
I also get correct results when I go to HEAD and execute the query 
{"query":{"term":{"severity":"HIGH"}}}

The results are like:

{"_index":"alerts","_type":"alert","_id":"_zR5BTp7QLCpt0Dh2-7cxA","_score":1.7461716,"_source":{"alertId":3,"alertName":"text
 3","createdOn":1401930512641,"severity":"HIGH"}}



But when I use java to connect and get the results I get no results

Code is:
Node node = nodeBuilder().clusterName("elasticsearch").node();
Client client = node.client();
SearchRequestBuilder srb = client
.prepareSearch("alerts")
.setTypes("alert")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.termQuery("severity", "HIGH")) ;
System.out.println("Sending:" + srb.toString());
SearchResponse response = srb.execute().actionGet();
System.out.println("Searched");
System.out.println("GOT ROWS:" + response.toString());


Output is:

Sending:{
  "query" : {
"term" : {
  "severity" : "HIGH"
}
  }
}
Searched
GOT ROWS:{
  "took" : 50,
  "timed_out" : false,
  "_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
  },
  "hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
  }
}


Can anybody help.. I have tried a lot of things but it is not working.
I am using jdk1.7.0_40.. if that makes any difference

Thanks
C

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f15b62d-951e-48af-9acc-e9b4b10149e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: percolator does not support min_score?

2014-06-04 Thread Miyuki Endo
Hi,

I thought that restriction of a percolator query type are only has_child, 
top_child, has_parent, nested. 

I would like to notice that document if the score exceeds the specified, 
using percolator.
However, I understand that min_score does not work. 
Thank you very much.

2014年6月4日水曜日 14時56分29秒 UTC+9 Jun Ohtani:
>
> Hi, 
>
> I’m not sure how to implement the percolator, 
> but I think that min_score does not work properly. 
> Because the percolator is processing one document at a time, 
> it is different from the usual score. 
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html#_important_notes
>  
>
> Why do you want to use the min_score? 
>
>  
> Jun Ohtani 
> joh...@gmail.com  
> twitter : http://twitter.com/johtani 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/272bfed2-03d1-4638-a830-80ee7e15ea22%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread virgil
Really good suggestion!  Yeah, the search will work by executing a standard
function score query with the boost script name. Thank you!


2014-06-04 15:06 GMT-07:00 joergpra...@gmail.com [via ElasticSearch Users] <
ml-node+s115913n4057071...@n3.nabble.com>:

>
> As said, it is true that scoring scripts (like the function score scripts
> o the AbstractSearchScript) need to reside on data nodes. Accessing fields
> is a low level operation in a script so it is not possible to install such
> a boost plugin that uses scripting on a data-less node. You would have to
> install it on all the data nodes which might become tedious (but it is
> doable).
>
> Another issue is that you use scripting in a java plugin. I conclude from
> this, the search should work later over the HTTP API by executing a
> standard function score query with the boost script name (is that true?)
>
> Writing a plugin, in a pure java environment, you have much more degrees
> of freedom to supersede the script functionality and use other code paths.
> For example, you could reuse the resource watch service from ES (used for
> watching script file changes) to reload the boost info (which is in your
> binary files I assume). Then you could build the query internally using the
> Java API as a custom score query action and execute it from your favorite
> (data-less) node (or from two nodes, for better fault tolerance / load
> balancing).
>
> Optionally, you could expose a new endpoint to the ES REST API, for
> example "_search_with_boost", which works like "_search", but makes use of
> the boost info files.
>
> For a more generic solution, it would be convenient to convert the boost
> info into a JSON parameter file so this could be loaded by the standard ES
> settings/config routines and by other languages, also for better reuse by
> others in the ES community :) An example plugin name could be "boost
> control plugin"...
>
> Jörg
>
>
>
> On Wed, Jun 4, 2014 at 8:15 PM, virgil <[hidden email]
> > wrote:
>
>> Yeah, but I would consider the nondata node is already doing the job. --
>> "These "non data" nodes are still part of the cluster, and they redirect
>> operations exactly to the node that holds the relevant data. The other
>> benefit is the fact that for scatter / gather based operations (such as
>> search), these nodes will take part of the processing since they will
>> start
>> the scatter process, and perform the actual gather processing." I just
>> uploaded my native script code in https://github.com/virgil0/TestPlugin.
>> It
>> works with the function score query. You can see that there are 3 bin
>> file I
>> need to load into memory. Thank you for reply.
>>
>>
>>
>> --
>> View this message in context:
>> http://elasticsearch-users.115913.n3.nabble.com/ANN-Elasticsearch-Simple-Action-Plugin-tp4056971p4057054.html
>>
>> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [hidden email]
>> .
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/1401905723480-4057054.post%40n3.nabble.com
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email]
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGBMEEc6oC1%3DBX7gS41se13BExO_iKJtiGC6zrhmxJqxA%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://elasticsearch-users.115913.n3.nabble.com/ANN-Elasticsearch-Simple-Action-Plugin-tp4056971p4057071.html
>  To unsubscribe from [ANN] Elasticsearch Simple Action Plugin, click here
> 
> .
> NAML
> 
>




Re: Inter-document Queries

2014-06-04 Thread Itamar Syn-Hershko
You need to be able to form buckets that can be reduced again, either using
the aggregations framework or a query. One model that will allow you to do
that is something like this:

{ "userid": "xyz", "path":"/sale/B", "previous_paths":[...],
"tstamp":"...", ... }

So whenever you add a new path, you denormalize and add previous paths that
could be relevant. This might bloat your storage a bit and be slower on
writes, but it is very optimized for reads since now you can do an
aggregation that queries for the desired "path" and buckets on the user. To
check the condition of the previous path you should be able to bucket again
using a script, or maybe even with a query on a nested type.

This is just from the top of my head but should definitely work if you can
get to that model

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Thu, Jun 5, 2014 at 2:36 AM, Zennet Wheatcroft 
wrote:

> Yes. I can re-index the data or transform it in any way to make this query
> efficient.
>
> What would you suggest?
>
>
>
> On Wednesday, June 4, 2014 2:14:09 PM UTC-7, Itamar Syn-Hershko wrote:
>
>> This model is not efficient for this type of querying. You cannot do this
>> in one query using this model, and the pre-processing work you do now +
>> traversing all documents is very costly.
>>
>> Is it possible for you to index the data (even as a projection) into
>> Elasticsearch using a different model, so you can use ES properly using
>> queries or the aggregations framework?
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko 
>> Freelance Developer & Consultant
>> Author of RavenDB in Action 
>>
>>
>> On Thu, Jun 5, 2014 at 12:04 AM, Zennet Wheatcroft 
>> wrote:
>>
>>> Hi,
>>>
>>> I am looking for an efficient way to do inter-document queries in
>>> Elasticsearch. Specifically, I want to count the number of users that went
>>> through an exit point B after visiting point A.
>>>
>>> In general terms, say we have some event log data about users actions on
>>> a website:
>>> 
>>> {"userid":"xyz", "machineid":"110530745", "path":"/promo/A", "country":
>>> "US", "tstamp":"2013-04-01 00:01:01"}
>>> {"userid":"pdq", "machineid":"110519774", "path":"/page/1", "country":
>>> "CN", "tstamp":"2013-04-01 00:02:11"}
>>> {"userid":"xyz", "machineid":"110530745", "path":"/promo/D", "country":
>>> "US", "tstamp":"2013-04-01 00:06:31"}
>>> {"userid":"abc", "machineid":"110527022", "path":"/page/23", "country":
>>> "DE", "tstamp":"2013-04-01 00:08:00"}
>>> {"userid":"pdq", "machineid":"110519774", "path":"/page/2", "country":
>>> "CN", "tstamp":"2013-04-01 00:08:55"}
>>> {"userid":"xyz", "machineid":"110530745", "path":"/sale/B", "country":
>>> "US", "tstamp":"2013-04-01 00:09:46"}
>>> {"userid":"abc", "machineid":"110527022 ", "path":"/promo/A", "country":
>>> "DE", "tstamp":"2013-04-01 00:10:46"}
>>> 
>>> And we have 500+M such entries.
>>>
>>> We want a count of the number of userids that visited path=/sale/B after
>>> visiting path=/promo/A.
>>>
>>> What I did is to preprocess the data, sorting by , then
>>> compacting all events by the same userid into the same document. Then I
>>> wrote a script filter which traverses the path array per document, and
>>> returns true if it finds any occurrence of B followed by A. This however is
>>> inefficient. Most of our queries take 1 or 2 seconds on 100+M events. This
>>> script filter query takes over 300 seconds. Specifically, it can process
>>> events at about 400K events per second. BY comparison, I wrote a naive
>>> program that does a linear pass of the un-compacted data and that process
>>> 11M events per second. By which I conclude that Elasticsearch does not do
>>> well on this type of query.
>>>
>>> I am hoping someone can indicate a more efficient way to do this query
>>> in ES. Or else confirm that ES cannot do inter-document queries well.
>>>
>>> Thanks,
>>> Zennet
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/28c93f2d-e870-4347-8677-e9da41b6be62%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit

Re: Inter-document Queries

2014-06-04 Thread Zennet Wheatcroft
Yes. I can re-index the data or transform it in any way to make this query 
efficient. 

What would you suggest?


On Wednesday, June 4, 2014 2:14:09 PM UTC-7, Itamar Syn-Hershko wrote:
>
> This model is not efficient for this type of querying. You cannot do this 
> in one query using this model, and the pre-processing work you do now + 
> traversing all documents is very costly.
>
> Is it possible for you to index the data (even as a projection) into 
> Elasticsearch using a different model, so you can use ES properly using 
> queries or the aggregations framework?
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko 
> Freelance Developer & Consultant
> Author of RavenDB in Action 
>
>
> On Thu, Jun 5, 2014 at 12:04 AM, Zennet Wheatcroft  > wrote:
>
>> Hi,
>>
>> I am looking for an efficient way to do inter-document queries in 
>> Elasticsearch. Specifically, I want to count the number of users that went 
>> through an exit point B after visiting point A.
>>
>> In general terms, say we have some event log data about users actions on 
>> a website:
>> 
>> {"userid":"xyz", "machineid":"110530745", "path":"/promo/A", "country":
>> "US", "tstamp":"2013-04-01 00:01:01"}
>> {"userid":"pdq", "machineid":"110519774", "path":"/page/1", "country":
>> "CN", "tstamp":"2013-04-01 00:02:11"}
>> {"userid":"xyz", "machineid":"110530745", "path":"/promo/D", "country":
>> "US", "tstamp":"2013-04-01 00:06:31"}
>> {"userid":"abc", "machineid":"110527022", "path":"/page/23", "country":
>> "DE", "tstamp":"2013-04-01 00:08:00"}
>> {"userid":"pdq", "machineid":"110519774", "path":"/page/2", "country":
>> "CN", "tstamp":"2013-04-01 00:08:55"}
>> {"userid":"xyz", "machineid":"110530745", "path":"/sale/B", "country":
>> "US", "tstamp":"2013-04-01 00:09:46"}
>> {"userid":"abc", "machineid":"110527022 ", "path":"/promo/A", "country":
>> "DE", "tstamp":"2013-04-01 00:10:46"}
>> 
>> And we have 500+M such entries.
>>
>> We want a count of the number of userids that visited path=/sale/B after 
>> visiting path=/promo/A.
>>
>> What I did is to preprocess the data, sorting by , then 
>> compacting all events by the same userid into the same document. Then I 
>> wrote a script filter which traverses the path array per document, and 
>> returns true if it finds any occurrence of B followed by A. This however is 
>> inefficient. Most of our queries take 1 or 2 seconds on 100+M events. This 
>> script filter query takes over 300 seconds. Specifically, it can process 
>> events at about 400K events per second. BY comparison, I wrote a naive 
>> program that does a linear pass of the un-compacted data and that process 
>> 11M events per second. By which I conclude that Elasticsearch does not do 
>> well on this type of query.
>>
>> I am hoping someone can indicate a more efficient way to do this query in 
>> ES. Or else confirm that ES cannot do inter-document queries well. 
>>
>> Thanks,
>> Zennet
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/28c93f2d-e870-4347-8677-e9da41b6be62%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5c576f27-4b14-4a2d-9415-17ac50e41371%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Queries, filters and match_all

2014-06-04 Thread Ivan Brusic
There is no label, but the change was made last December:

https://github.com/elasticsearch/elasticsearch/pull/4461

It appears that the REST API still supports the old notation, but the
change did break Java backwards compatibility

https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/search/query/QueryPhase.java#L71

-- 
Ivan



On Tue, Jun 3, 2014 at 8:11 PM, Arkadiy Zabazhanov 
wrote:

> Btw, Answer for the second question is top-level filter was renamed to
> post_filter. That's awesome. So the first question is answered too.
> Filtered query is preferred.
> Still waiting for an answer for the third question. Since I didn't find
> filter to post_filter renaming in changelog (
> http://www.elasticsearch.org/downloads/1-0-0/) and I can't find anything
> about new query behavior. I need just version where was it changed, please.
>
> вторник, 3 июня 2014 г., 19:27:17 UTC+7 пользователь Arkadiy Zabazhanov
> написал:
>
>> Hello. Help me please, I'm confused. As far as I remember, there was the
>> only way to pass filters to search query - via filtered query. But
>> currently there is a top-level filter part of the query. However,
>> top-level filter affects query only and doesn't affect i.e. facets. But
>> filtered query filter affects both of the query and facets facilities.
>> Also, I remember there was a time I need to add match_all query to
>> filtered query section if query was empty and filters only was present.
>> Otherwise returned empty set of documents. Since I'm trying to create
>> high-level Ruby library could you please answer following questions:
>>
>> 1) Which way is preferred now and in future: filtered top-level query or
>> top-level filter with top-level query?
>> 2) How do you plan to resolve such an API inconsistency when filtered
>> query filter affects outside statements and top-level filter doesn't affect
>> some parts of request?
>> 3) Why do I remember about match_all feature and when did requests
>> started to return all the documents with empty query section in filtered
>> query? I'm checking it right now on 1.2.0 and I don't need to use
>> match_all, or constant_score it just returns all the docs for me.
>>
>> Thanks in advance.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/c8bddc46-7347-4ca9-a9ea-65100a017673%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB7xiafarYujtzg14EpTTi709DJjgZ%3DwyJb8J1tcwFo6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch/Lucene Delete space reuse? recovery?

2014-06-04 Thread Ivan Brusic
Lucene will hold onto deleted documents until a merged is performed. An
update in Lucene is basically an atomic delete/insert.

An optimize will help reclaim the space used by deleted documents. Did you
change your merge settings? Deleted documents should eventually be removed
whenever new segments are created.

Cheers,

Ivan


On Tue, Jun 3, 2014 at 8:54 AM, smonasco  wrote:

> I'm starting a project to index log files.  I don't particularly want to
> wait until the log files roll over.  There will be files from 100's of apps
> running across 100's of machines (not all apps intersect with all machines,
> but you get the drift).  Some roll over very fast; some may take days.
>
> The problem comes that if I am constantly reindexing the same document
> (same id) am I loosing all old space (store and or index) or is
> Elasticsearch/Lucene smart enough to say here's a new version we'll
> overwrite the old store/index entries and point to this one where they are
> the same and add new ones.
>
> Certainly, there is a more sophisticated model that treats every line as a
> unique document/row such that this doesn't become an issue, but I'm not
> ready to spend that kind of dev and hardware at this issue.  (Our
> elasticsearch solution is wrapped in a system that becomes really heavy
> handed when indexing such small pieces.)
>
> --Shannon Monasco
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/9d9d38f7-ba4f-470c-9864-5b9af8abc773%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDuQvdfN7oBBA%2BWX%2BOCKGu6SxiqFckhVqGXm5QbenXYqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Best cluster environment for search

2014-06-04 Thread joergpra...@gmail.com
Why do you use terms on _id field and not the the ids filter? ids filter is
more efficient since it reuses the _uid field which is cached by default.

Do the terms in the query vary from query to query? If so, caching might
kill your heap.

Another possible issue is that your query is not distributed to all shards,
if the query does not vary from user to user in your test. If so, you
created a "hot spot", all the load from the 100 users wold go to a limited
number of node with a limited shard count.

The search thread pool seems small with 50 searches if you execute searches
for 100 users in parallel, this can lead to a congestion of the search
module. Why don't you use 100 (at least)?

Jörg


On Wed, Jun 4, 2014 at 2:40 PM, Marcelo Paes Rech  wrote:

> Hi Jörg. Thanks for your reply.
>
> Here is my filter.
> {"filter":
> {
>   "terms" : {
> "_id" : [ "QSxrbEM8TKe5zr8931xBjA", "wj63ghegRwC6qLsWq2chkA",
> "hYEhDbAqQwSRxhYfvDgFkg", "4bZmPE1fTYqijphRyyWiuQ",
> "Fhq53yYyT3CEw6vclKu_NA", "XL2atBraTEyx57MefjFVhA",
> "951i0dZkT064FlQkzHnnWA", "O8Ixbir1TrGT_IA3wKfsHg",
> "8k4U7KsuTmsThqxy-5YaKw", "GNOoQTHglf22kzcE7EOf8g",
> "-RQeY48fTg2kYnh2M4E1cQ", "u8DGBdfVR9WRVj6d9E4Ebw",
> "WFHSXd7UQvCMYFBhFcTsng", "qnQ7q7FyTsg397lM1EWgqA",
> "wRQtUzdMRy2qOkMCNxdpgA", "Ll83iglxSUS_Gs7mjkMt8w",
> "d2sxZ1oBTfuvAfov5EJ0iw", "cyht-vB4Q-mMSg9N5jcGXg",
> "bNSVaO47QTOCkfJhWo0qjg", "BHuhm55IRerKnynJ8WgFTw",
> "fHKA4PF2QteWm8E7dW7CAw", "DLE6A7tyQJ-zcKcCa6IPSA",
> "qfelTW7-SuGRQ0GKbngARA", "R7VHHJhYsUqfuxYof8BJ8w",
> "W4PqiJfPSlSFjVKFsGkA4Q", "Juq62zOsRdheuW3O6Gb2KA",
> "U9v0IKj_RrgRNjE31ZTt2g", "uNHa0kOOT5qjPpzxZcs35A",
> "SwOgVNgIRwyVU3pEEycBuQ", "LaEpxFGIQgCArsNZ2rd4Pw",
> "CiJ9gouZsbmTtxTWx7w6lA", "TaQV_I01RfCq3B6uAtIBoQ",
> "9Jpjo5k-RlGfLVLF6nDgze", "57YpjRdASsrrae-RD3spog",
> "bmA4EWFSTiKUaDzaNcCFKQ", "Fui9z_UbRe6AY1VhAr8Crw",
> "2PORr5BzSDOmBXgmQkO5Zg", "snfwTmtuTv-uj5mOWSJpgA",
> "0nHIrtePSaeW8aWArh_Mrg", "s0g9QHnjTgWX3rCIu1g0Hg",
> "Jl67fACuQvCFgZxXAFtDOg" ],
> "_cache" : true,
> "_cache_key" : "my_terms_cache"
>   }
> }
> }
>
> I already used "*ids filter*" but I got same behaviour. One thing that I
> realized is that one of the cluster's nodes is increasing the Search Thread
> Pool (something like Queue: 50 and Count: 47) and the others don't
> (something like Queue: 0 and Count: 1). If I remove this node from the
> cluster another one starts with the same problem.
>
> My current environment is:
> - 7 Data nodes with 16Gb (8Gb for ES)and 8 cores each one;
> - 4 Load balancer Nodes (no data, no master)  with 4Gb (3Gb for ES) and 8
> cores each one;
> - 4 MasterNodes (only master, no data)  with 4Gb (3Gb for ES) and 8 cores
> each one;
> - Thread Pool Search 47 (the others are standard config);
> - 7 Shards and 2 replicas Index;
> - 14.6Gb Index size (14.524.273 documents);
>
>
> I'm executing this filter with 50 concurrent users.
>
>
> Regards
>
> Em terça-feira, 3 de junho de 2014 20h33min45s UTC-3, Jörg Prante escreveu:
>>
>> Can you show your test code?
>>
>> You seem to look at the wrong settings - by adjusting node number, shard
>> number, replica number alone, you can not find out the maximum node
>> performance. E.g. concurrency settings, index optimizations, query
>> optimizations, thread pooling, and most of all, fast disk subsystem I/O is
>> important.
>>
>> Jörg
>>
>>
>> On Wed, Jun 4, 2014 at 12:18 AM, Marcelo Paes Rech 
>> wrote:
>>
>>> Thanks for your reply Nikolas. It helps a lot.
>>>
>>> And about the quantity of documents of each shard, or size of each
>>> shard. And the need of no data nodes or only master nodes. When is it
>>> necessary?
>>>
>>> Some tests I did, when I increased request's number (like 100 users at
>>> same moment, and redo it again and again), 5 nodes with 1 shard and 2
>>> replicas each and 16Gb RAM (8Gb for ES and 8Gb for OS) weren't enough. The
>>> response time start to increase more than 5s (I think less than 1s,  in
>>> this case, would be acceptable) .
>>>
>>> This test has a lot of documents (something like 14 millions).
>>>
>>>
>>> Thanks. Regards.
>>>
>>> Em segunda-feira, 2 de junho de 2014 17h09min04s UTC-3, Nikolas Everett
>>> escreveu:
>>>



 On Mon, Jun 2, 2014 at 3:52 PM, Marcelo Paes Rech >>> > wrote:

 Hi guys,
>
> I'm looking for an article or a guide for the best cluster
> configuration. I read a lot of articles like "change this configuration"
> and "you must create X shards per node" but I didn't saw nothing like
> ElasticSearch Official guide for creating a cluster.
>
> What I would like to know are informations like.
> - How to calculate how many shards will be good for the cluster.
> - How many shards do we need per node? And if this is variable, how do
> I calculate this?
> - How much memory do I need per node and how many nodes?
>
> I think ElasticSearch is well documentated. But it is very fragmented.
>
>
>
 For some of these that is because "it depends" is th

Re: Shard count and plugin questions

2014-06-04 Thread Mark Walkom
1) The answer is - it depends. You want to setup a test system with
indicative specs, and then throw some sample data at it until things start
to break. However this may help
https://www.found.no/foundation/sizing-elasticsearch/
2) https://github.com/jprante/elasticsearch-knapsack might do what you want.
3) How real time is real time? You can change index.refresh_interval to
something small so that window of "unflushed" items is minimal, but that
will have other impacts.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 5 June 2014 04:18, Todd Nine  wrote:

> Hi All,
>   We've been using elastic search as our search index for our new
> persistence implementation.
>
> https://usergrid.incubator.apache.org/
>
> I have a few questions I could use a hand with.
>
> 1) Is there any good documentation on the upper limit to count of
> documents, or total index size, before you need to allocate more shards?
>  Do shards have a real world limit on size or number of entries to keep
> response times low?  Every system has it's limits, and I'm trying to find
> some actual data on the size limits.  I've been trolling Google for some
> answers, but I haven't really found any good test results.
>
>
> 2) Currently, it's not possible to increase the shard count for an index.
> The workaround is to create a new index with a higher count, and move
> documents from the old index into the new.  Could this be accomplished via
> a plugin?
>
>
> 3) We sometimes have "realtime" requirements.  In that when an index call
> is returned, it is available.  Flushing explicitly is not a good idea from
> a performance perspective.Has anyone explored searching in memory the
> documents that have not yet been flushed and merging them with the Lucene
> results?  Is this something that's feasible to be implemented via a plugin?
>
> Thanks in advance!
> Todd
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/940c6404-6667-4846-b457-977e705d3797%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624aheN2C4wvZRnxxNA%3DpTzwgjHQwCLH0041d-J0DNj37_A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hourly Shards Elasticsearch/Kibana

2014-06-04 Thread Mark Walkom
TTL isn't the best idea as it consumes a lot of resources. You're better
off getting your hourly indexes working.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 5 June 2014 02:29, Antonio Augusto Santos  wrote:

> Hey There,
>
> Did you remember to change the "Timestamping" on Kibana so that it would
> know you are using an hourly index ? Go the index configuration screen to
> see that.
>
> Also, if you have the requirement for 24 hour roll out, did you try
> enabling _ttl (
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-ttl-field.html)
> on your indices ? Like that the docs older than the specified time would be
> automatically deleted.
>
>
> On Wednesday, June 4, 2014 12:16:56 PM UTC-3, Kellan Strong wrote:
>>
>> Hello All,
>>
>> I have a question about hourly sharding with either logstash or fluentd.
>> Since we are, or will be using, a set up called FLEKZ. I am trying to
>> integrate both logstash and fluentd together, which work well with each
>> other. However, I have a business requirement for a rolling 24hour shard
>> deletion.
>>
>> When I add
>>
>> logstash_dateformat %Y.%m.%d.%H
>>
>> in fluentd and
>>
>> index => "logstash-%{+.MM.dd.HH}"
>>
>> into logstash.
>>
>> Elasticsearch cannot find the indices anymore. I go onto Kibana and they
>> cannot be found. I switch back to the normal Y.m.d in both and the
>> information is back on the screen. Using the api I am also not able to
>> search any of the indices. Is there something I am doing wrong or is there
>> something in the config file that I am missing?
>>
>> Thank you for your help,
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a1bceabb-ea26-4aa5-8358-92f6f8e2ae1e%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Zp%2BsoR-uUSDmVm_5MRBZs4AHBh-9ggT9twK5ucR_vRyg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cluster gets stuck after full re-index

2014-06-04 Thread Mark Walkom
If you're halved your node count and also reduced the amount of RAM, then
you're probably running into GC problems.

Install something like elastichq or marvel, and then check what is
happening on a cluster and node level.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 5 June 2014 07:18, Florian Munz  wrote:

> I don't see any signs of GC in the logs or somewhere else, shouldn't there
> be high CPU usage in that case?
>
> We moved from 4 to 2 nodes and from 2 to 1 number of replicas.
>
>
> Cheers,
> Florian
>
> On 03.06.14 12:36, Mark Walkom wrote:
>
>> Am I reading that right, you're basically at 100% heap usage? If that is
>> the case then it'd be GC that's killing you.
>>
>> Did you add more nodes when you moved to AWS or do you have the same
>> number?
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com 
>> web: www.campaignmonitor.com 
>>
>>
>> On 3 June 2014 20:27, Florian Munz > > wrote:
>>
>> Other than the jmap -heap I didn't manage to look more specifically
>> into it:
>>
>> https://gist.github.com/theflow/b983d512ea344545f7f6#file-jmap
>>
>> The same process runs fine on much smaller machines in our staging
>> environment, without the live traffic, of course.
>>
>> Anything particular I should run that would give more insights?
>>
>>
>> Cheers,
>> Florian
>>
>>
>> On Tuesday, June 3, 2014 12:21:32 PM UTC+2, Mark Walkom wrote:
>>
>> How does your heap look during all this?
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com 
>>
>>
>> On 3 June 2014 20:14, Florian Munz  wrote:
>>
>> Hello,
>>
>> we recently moved our ES cluster from dedicated hardware to
>> AWS instances, they have less memory available, but use SSDs
>> for the ES data directory. We kept JVM (1.7.0_17) and ES
>> (0.90.9) version exactly the same. On the new hardware,
>> after running a full re-index (creating a new index,
>> pointing an alias to the new and one alias to the old index,
>> sending realtime updates to both aliases and running a
>> script to fill up the new index) our cluster gets stuck.
>>
>> 10 minutes after the re-index finishes and we move both
>> aliases to the new index, ES stops answering any search or
>> index queries, no errors in the logs apart from it not
>> answering queries anymore:
>>
>> org.elasticsearch.common.util.__concurrent.__
>> EsRejectedExecutionException:
>> rejected execution (queue capacity 1000) on
>> org.elasticsearch.action.__search.type.__
>> TransportSearchTypeAction$__BaseAsyncAction$4@172018e5
>>
>> CPU load is low, it doesn't look like it's doing anything
>> expensive. A request to hot_threads times out. I've put the
>> output from jstack and jmap here:
>>
>> https://gist.github.com/__theflow/b983d512ea344545f7f6
>> 
>>
>> We tried upgrading to 0.90.13, since the changelog mentioned
>> a problem with infinite loops, but same behavior. We're
>> planning to upgrade to a more recent version of ES soon, but
>> it'll take a bit to fully test that.
>>
>>
>> Any ideas what could be causing this?
>>
>>
>> thanks,
>> Florian
>>
>> --
>> You received this message because you are subscribed to the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails
>> from it, send an email to elasticsearc...@__googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/__msgid/elasticsearch/7a347529-_
>> _df1a-4a21-9ac1-d3af882a035a%__40googlegroups.com
>> > df1a-4a21-9ac1-d3af882a035a%40googlegroups.com?utm_medium=
>> email&utm_source=footer>.
>> For more options, visit https://groups.google.com/d/__optout
>> .
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to elasticsearch+unsubscr...@googlegroups.com
>> .
>> To view this discussion on the web visit

Re: Marvel/Sense Troubleshooting

2014-06-04 Thread Mark Walkom
Did you check your ES logs to see that it started and loaded the plugin?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 5 June 2014 00:46, Fergie  wrote:

> Hi -
>
> I'm previous Solr user (bison.usgs.ornl.gov) and followed the
> instructions for installing Marvel on Mac running 10.9.2.
>
>
> Trying http://download.elasticsearch.org/elasticsearch/marvel/
> marvel-latest.zip...
>
> Downloading 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ...DONE
>
> Installed elasticsearch/marvel/latest into /usr/local/var/lib/
> elasticsearch/plugins/marvel
>
> Douglass-MacBook-Pro:bin douglaskunzman$ plugin -l
>
> Installed plugins:
>
> - marvel
>
> I restarted elasticsearch and chrome and typed
> localhost:9200/_plugin/marvel/ and nothing happens.  Can someone offer some
> advice on how to troubleshoot?
>
> I searched the elasticsearch site and this list for this information and
> can't find anything?
>
> Doug
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/489bef92-33ff-41f0-8d0e-334f355d170f%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bN_gjYKXCqf6Mbq2sGDdqywVnpaaodwueM07_wjn0-Sw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch upgrade procedure

2014-06-04 Thread Mark Walkom
There's an upgrade process in the docs on the site -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 5 June 2014 00:45, Aldian  wrote:

> Thanks you very much for the detailed reply. I will proceed this way.
> Aldian
>
>
> 2014-06-04 16:15 GMT+02:00 Nikolas Everett :
>
>>
>>
>>
>> On Wed, Jun 4, 2014 at 10:03 AM, Aldian  wrote:
>>
>>> Hi
>>>
>>> I am currently using elasticsearch 1.0.2. Since I have a memory leak
>>> problem that forces me to restart it every 5 or 6 days, I am considering
>>> upgrading to 1.2.1. But I did not found any upgrade guide, nor indications
>>> whether new versions were retro-compatible. Please indicate if there is any
>>> risk for the data?
>>>
>>
>>
>> The data is compatible.  Sometimes, like when Elasticsearch jumps a major
>> version, the protocol that Elasticsearch servers use to communicate with
>> eachother isn't backwards compatible.  This is rare, and the 1.0.2->1.2.1
>> upgrade is fully backwards compatible.  That being said, there are some
>> issues with running a cluster with two versions of Elasticsearch: you
>> (mostly) can't move data from a newer node to an older node.  Features of
>> the new version may degrade because they aren't getting what they need from
>> their peers who are on the old version.  This is generally OK because you
>> won't be relying on features of the newer version until after the upgrade.
>>
>> The upshot: a rolling upgrade is safe for you.  Turn off shard assignment
>> on the cluster, update on node, restart elasticsearch on that node, turn
>> shard assignment back on, wait for the cluster to go green, repeat.
>>
>> Make sure to upgrade any plugins that you have installed when you upgrade
>> Elasticsearch.
>>
>> Nik
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/wF-WCPLUdQo/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3NB5x4nydx6N3ggAhqeWh%2B4LTrj_7L2ZkqYh4FCu%2BY0w%40mail.gmail.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Cordialement,
>
> Aldian
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAECUaLxBANfsPMUq6U%2B5GPVEttMNVNUzAjYtuuGJv-0aAPMZWQ%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y9Yjwyx6x2Yt3%3DeUiL5LkEFrdbFNdpwEhVZEUHGQpD1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
Great walkthrough :) ... I only miss the mentioning of the standard
language plugins for scripting (groovy, js, etc.) And rivers are not
obsolete, the "pull" method from a singleton "river" node is just
discouraged.

Jörg


On Wed, Jun 4, 2014 at 11:30 PM, Itamar Syn-Hershko 
wrote:

> You should have released this before my talk last week, I could have
> mentioned it :\
>
> https://www.youtube.com/watch?v=FbAO2k57bdg
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko 
> Freelance Developer & Consultant
> Author of RavenDB in Action 
>
>
> On Tue, Jun 3, 2014 at 6:15 PM, joergpra...@gmail.com <
> joergpra...@gmail.com> wrote:
>
>> Hi,
>>
>> many of us want to start writing extensions for Elasticsearch.
>>
>> Except submitting pull requests to the core code, one great advantage of
>> Elasticsearch is the plugin mechanism. Here, custom code can be hooked into
>> Elasticsearch, without having to ask for inclusion into the core code.
>> Nevertheless, plugin code can be published on Github and easily included
>> into a running ES instance by using the ES plugin command line tool.
>>
>> Unfortunately, writing plugins is not so easy as it seems. There are many
>> plugins, some of them are very advanced, and finding a starting point for a
>> personal project could be quite hard.
>>
>> Hence, for educational purposes, I wrote a tiny plugin, as a starting
>> point, to demonstrate how a plugin works.
>>
>> The simple plugin is indeed very simple. It makes reuse of the standard
>> search action:
>>
>> - it defines a built-in query (a "match all" query)
>>
>> - it creates a custom action for it
>>
>> - the action is called from Java API
>>
>> - the result of the action (the search response of the "match all" query)
>> is logged
>>
>> The plugin code comes with a junit test. It is available at
>>
>> https://github.com/jprante/elasticsearch-simple-action-plugin
>>
>> In the hope it is useful,
>>
>> Jörg
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH-M6%2BZroAz8Reb3e2agW0vXKSavk%3D0hD_bq%2BBHtRYLhw%40mail.gmail.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt-16RTbRh376Kxg%3Di7DmjRhav-PYk_7qs1J5wu1W5a8w%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEyF-hh3yEfeznk6p8v1tuzreDTYgcUVohuRW%3DpXRyO2w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
As said, it is true that scoring scripts (like the function score scripts o
the AbstractSearchScript) need to reside on data nodes. Accessing fields is
a low level operation in a script so it is not possible to install such a
boost plugin that uses scripting on a data-less node. You would have to
install it on all the data nodes which might become tedious (but it is
doable).

Another issue is that you use scripting in a java plugin. I conclude from
this, the search should work later over the HTTP API by executing a
standard function score query with the boost script name (is that true?)

Writing a plugin, in a pure java environment, you have much more degrees of
freedom to supersede the script functionality and use other code paths. For
example, you could reuse the resource watch service from ES (used for
watching script file changes) to reload the boost info (which is in your
binary files I assume). Then you could build the query internally using the
Java API as a custom score query action and execute it from your favorite
(data-less) node (or from two nodes, for better fault tolerance / load
balancing).

Optionally, you could expose a new endpoint to the ES REST API, for example
"_search_with_boost", which works like "_search", but makes use of the
boost info files.

For a more generic solution, it would be convenient to convert the boost
info into a JSON parameter file so this could be loaded by the standard ES
settings/config routines and by other languages, also for better reuse by
others in the ES community :) An example plugin name could be "boost
control plugin"...

Jörg



On Wed, Jun 4, 2014 at 8:15 PM, virgil  wrote:

> Yeah, but I would consider the nondata node is already doing the job. --
> "These "non data" nodes are still part of the cluster, and they redirect
> operations exactly to the node that holds the relevant data. The other
> benefit is the fact that for scatter / gather based operations (such as
> search), these nodes will take part of the processing since they will start
> the scatter process, and perform the actual gather processing." I just
> uploaded my native script code in https://github.com/virgil0/TestPlugin.
> It
> works with the function score query. You can see that there are 3 bin file
> I
> need to load into memory. Thank you for reply.
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/ANN-Elasticsearch-Simple-Action-Plugin-tp4056971p4057054.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1401905723480-4057054.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGBMEEc6oC1%3DBX7gS41se13BExO_iKJtiGC6zrhmxJqxA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch aggregation script not maintaining document ID integrity

2014-06-04 Thread Benjamin Smith
I've looked into this and that is *not* the case.

For example, by using the following script, I can determine one of the 
document IDs that is passing the offending tag id/name combination:

doc['id'].value+'|'+doc['tag.id'].value+'|'+doc['tag.name.raw'].value

Which returns:

45|352|Tag B

When I look at that document, it has the following tag array:

{
"id": 45,
"tags": [
{
"id": "352",
"name": "Tag A"
},
{
"id": "355",
"name": "Tag B"
},
{
"id": "458",
"name": "Tag C"
}
]
}

The document is indexed correctly: Tag A = 352. My aggregation script is 
returning "Tag B" with id 322, however.

Any other ideas what could cause this? Is there an issue with my mapping?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3ccf90f9-f89b-4ff7-bc87-2be67334778b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread Ivan Brusic
Don't forget your slides. :)

http://code972.com/blog/2014/05/72-the-ultimate-guide-for-elasticsearch-plugins-video-slides



On Wed, Jun 4, 2014 at 2:30 PM, Itamar Syn-Hershko 
wrote:

> You should have released this before my talk last week, I could have
> mentioned it :\
>
> https://www.youtube.com/watch?v=FbAO2k57bdg
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko 
> Freelance Developer & Consultant
> Author of RavenDB in Action 
>
>
> On Tue, Jun 3, 2014 at 6:15 PM, joergpra...@gmail.com <
> joergpra...@gmail.com> wrote:
>
>> Hi,
>>
>> many of us want to start writing extensions for Elasticsearch.
>>
>> Except submitting pull requests to the core code, one great advantage of
>> Elasticsearch is the plugin mechanism. Here, custom code can be hooked into
>> Elasticsearch, without having to ask for inclusion into the core code.
>> Nevertheless, plugin code can be published on Github and easily included
>> into a running ES instance by using the ES plugin command line tool.
>>
>> Unfortunately, writing plugins is not so easy as it seems. There are many
>> plugins, some of them are very advanced, and finding a starting point for a
>> personal project could be quite hard.
>>
>> Hence, for educational purposes, I wrote a tiny plugin, as a starting
>> point, to demonstrate how a plugin works.
>>
>> The simple plugin is indeed very simple. It makes reuse of the standard
>> search action:
>>
>> - it defines a built-in query (a "match all" query)
>>
>> - it creates a custom action for it
>>
>> - the action is called from Java API
>>
>> - the result of the action (the search response of the "match all" query)
>> is logged
>>
>> The plugin code comes with a junit test. It is available at
>>
>> https://github.com/jprante/elasticsearch-simple-action-plugin
>>
>> In the hope it is useful,
>>
>> Jörg
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH-M6%2BZroAz8Reb3e2agW0vXKSavk%3D0hD_bq%2BBHtRYLhw%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt-16RTbRh376Kxg%3Di7DmjRhav-PYk_7qs1J5wu1W5a8w%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD0Pvbr0eenPiVYm032ZycyTGWKxL7MH3KNL5EBAJZCzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Identify word as dominant word in search

2014-06-04 Thread Ivan Brusic
I agree with Itamar. It sounds like you do have a list of colors and brands
(tagging), so you can add a boost value as a payload to the relevant terms.
You can use these payloads with a function score script or a custom
similarity. Not an easy solution. If you can maintain a mapping of values
in Elasticsearch (via a plugin), you can bypass the payload and lookup the
terms yourself.  Once again, not easy.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-delimited-payload-tokenfilter.html

-- 
Ivan


On Tue, Jun 3, 2014 at 3:00 AM, Itamar Syn-Hershko 
wrote:

> Depending on your corpus, this should happen automatically. That's what
> TF/IDF is about.
>
> What you can do further is use NLP methods to tag those items in search
> and indexing. Look up POS tagging and entity extraction.
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko 
> Freelance Developer & Consultant
> Author of RavenDB in Action 
>
>
> On Tue, Jun 3, 2014 at 12:22 PM, Rotem Haber 
> wrote:
>
>> hi,
>> Is there a search in elasticsearch that support the behavior that when a
>> user enter a string to search, the ES recognize words as important words in
>> search.
>> for example: the user enter the string NEXUS COVER FOR EVERY DAY USE SILK
>> SOFT BLUE, and I want that the brand(NEXUS) and color(BLUE) will be more
>> dominant in search, and I have a list of all the colors and all the brand
>> that exist.
>>
>> is it possible? and if yes, how do I implement that?
>>
>> thank you!
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/2948b28d-6c1e-490d-bdbb-80df5d7b0ebd%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsWggX9KP_fd75Qbfgk4uph9VNMbXUyaQQ3obMresdVyA%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAhjj%2BetB%2BNVNHtY6bvvA1de29jbOO2csmnEU%2B67Jxh-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread Itamar Syn-Hershko
You should have released this before my talk last week, I could have
mentioned it :\

https://www.youtube.com/watch?v=FbAO2k57bdg

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Tue, Jun 3, 2014 at 6:15 PM, joergpra...@gmail.com  wrote:

> Hi,
>
> many of us want to start writing extensions for Elasticsearch.
>
> Except submitting pull requests to the core code, one great advantage of
> Elasticsearch is the plugin mechanism. Here, custom code can be hooked into
> Elasticsearch, without having to ask for inclusion into the core code.
> Nevertheless, plugin code can be published on Github and easily included
> into a running ES instance by using the ES plugin command line tool.
>
> Unfortunately, writing plugins is not so easy as it seems. There are many
> plugins, some of them are very advanced, and finding a starting point for a
> personal project could be quite hard.
>
> Hence, for educational purposes, I wrote a tiny plugin, as a starting
> point, to demonstrate how a plugin works.
>
> The simple plugin is indeed very simple. It makes reuse of the standard
> search action:
>
> - it defines a built-in query (a "match all" query)
>
> - it creates a custom action for it
>
> - the action is called from Java API
>
> - the result of the action (the search response of the "match all" query)
> is logged
>
> The plugin code comes with a junit test. It is available at
>
> https://github.com/jprante/elasticsearch-simple-action-plugin
>
> In the hope it is useful,
>
> Jörg
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH-M6%2BZroAz8Reb3e2agW0vXKSavk%3D0hD_bq%2BBHtRYLhw%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt-16RTbRh376Kxg%3Di7DmjRhav-PYk_7qs1J5wu1W5a8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cluster gets stuck after full re-index

2014-06-04 Thread Itamar Syn-Hershko
Many concurrent costly operations (range queries or faceting when I/O ops
are required, segment merges, shard allocations etc) are known to starve ES
out of threads or processing power, and this is what you are experiencing -
no threads are capable of taking your requests.

Immediate solution is to run a master-only node (node.data = false) so you
have one node that acts as master and cluster coordinator that is known to
never starve out of system resources. Even running this node side-by-side
on the same server as one of the data nodes can protect you as it doesn't
have the same memory requirements etc as a data node.

Finally, there has been (and still is) a lot of work put into this so I
strongly recommend upgrading to the latest (currently it is 1.2.1).

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Tue, Jun 3, 2014 at 1:14 PM, Florian Munz  wrote:

> Hello,
>
> we recently moved our ES cluster from dedicated hardware to AWS instances,
> they have less memory available, but use SSDs for the ES data directory. We
> kept JVM (1.7.0_17) and ES (0.90.9) version exactly the same. On the new
> hardware, after running a full re-index (creating a new index, pointing an
> alias to the new and one alias to the old index, sending realtime updates
> to both aliases and running a script to fill up the new index) our cluster
> gets stuck.
>
> 10 minutes after the re-index finishes and we move both aliases to the new
> index, ES stops answering any search or index queries, no errors in the
> logs apart from it not answering queries anymore:
>
> org.elasticsearch.common.util.concurrent.EsRejectedExecutionException:
> rejected execution (queue capacity 1000) on
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@172018e5
>
> CPU load is low, it doesn't look like it's doing anything expensive. A
> request to hot_threads times out. I've put the output from jstack and jmap
> here:
>
> https://gist.github.com/theflow/b983d512ea344545f7f6
>
> We tried upgrading to 0.90.13, since the changelog mentioned a problem
> with infinite loops, but same behavior. We're planning to upgrade to a more
> recent version of ES soon, but it'll take a bit to fully test that.
>
>
> Any ideas what could be causing this?
>
>
> thanks,
> Florian
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7a347529-df1a-4a21-9ac1-d3af882a035a%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZvRJK_pR6nskv8ujpH8cCRp890UZ1d8M_iU0Zi-OULO%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cluster gets stuck after full re-index

2014-06-04 Thread Florian Munz
I don't see any signs of GC in the logs or somewhere else, shouldn't 
there be high CPU usage in that case?


We moved from 4 to 2 nodes and from 2 to 1 number of replicas.


Cheers,
Florian

On 03.06.14 12:36, Mark Walkom wrote:

Am I reading that right, you're basically at 100% heap usage? If that is
the case then it'd be GC that's killing you.

Did you add more nodes when you moved to AWS or do you have the same number?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com 
web: www.campaignmonitor.com 


On 3 June 2014 20:27, Florian Munz mailto:s...@theflow.de>> wrote:

Other than the jmap -heap I didn't manage to look more specifically
into it:

https://gist.github.com/theflow/b983d512ea344545f7f6#file-jmap

The same process runs fine on much smaller machines in our staging
environment, without the live traffic, of course.

Anything particular I should run that would give more insights?


Cheers,
Florian


On Tuesday, June 3, 2014 12:21:32 PM UTC+2, Mark Walkom wrote:

How does your heap look during all this?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com 


On 3 June 2014 20:14, Florian Munz  wrote:

Hello,

we recently moved our ES cluster from dedicated hardware to
AWS instances, they have less memory available, but use SSDs
for the ES data directory. We kept JVM (1.7.0_17) and ES
(0.90.9) version exactly the same. On the new hardware,
after running a full re-index (creating a new index,
pointing an alias to the new and one alias to the old index,
sending realtime updates to both aliases and running a
script to fill up the new index) our cluster gets stuck.

10 minutes after the re-index finishes and we move both
aliases to the new index, ES stops answering any search or
index queries, no errors in the logs apart from it not
answering queries anymore:


org.elasticsearch.common.util.__concurrent.__EsRejectedExecutionException:
rejected execution (queue capacity 1000) on

org.elasticsearch.action.__search.type.__TransportSearchTypeAction$__BaseAsyncAction$4@172018e5

CPU load is low, it doesn't look like it's doing anything
expensive. A request to hot_threads times out. I've put the
output from jstack and jmap here:

https://gist.github.com/__theflow/b983d512ea344545f7f6


We tried upgrading to 0.90.13, since the changelog mentioned
a problem with infinite loops, but same behavior. We're
planning to upgrade to a more recent version of ES soon, but
it'll take a bit to fully test that.


Any ideas what could be causing this?


thanks,
Florian

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to elasticsearc...@__googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/__msgid/elasticsearch/7a347529-__df1a-4a21-9ac1-d3af882a035a%__40googlegroups.com

.
For more options, visit https://groups.google.com/d/__optout
.


--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscr...@googlegroups.com
.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/04b3d0a2-a47e-47c6-8411-eb619c3c54bc%40googlegroups.com

.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/NFGiLsmPkk0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com
.
To v

Re: Inter-document Queries

2014-06-04 Thread Itamar Syn-Hershko
This model is not efficient for this type of querying. You cannot do this
in one query using this model, and the pre-processing work you do now +
traversing all documents is very costly.

Is it possible for you to index the data (even as a projection) into
Elasticsearch using a different model, so you can use ES properly using
queries or the aggregations framework?

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Thu, Jun 5, 2014 at 12:04 AM, Zennet Wheatcroft 
wrote:

> Hi,
>
> I am looking for an efficient way to do inter-document queries in
> Elasticsearch. Specifically, I want to count the number of users that went
> through an exit point B after visiting point A.
>
> In general terms, say we have some event log data about users actions on a
> website:
> 
> {"userid":"xyz", "machineid":"110530745", "path":"/promo/A", "country":
> "US", "tstamp":"2013-04-01 00:01:01"}
> {"userid":"pdq", "machineid":"110519774", "path":"/page/1", "country":"CN"
> , "tstamp":"2013-04-01 00:02:11"}
> {"userid":"xyz", "machineid":"110530745", "path":"/promo/D", "country":
> "US", "tstamp":"2013-04-01 00:06:31"}
> {"userid":"abc", "machineid":"110527022", "path":"/page/23", "country":
> "DE", "tstamp":"2013-04-01 00:08:00"}
> {"userid":"pdq", "machineid":"110519774", "path":"/page/2", "country":"CN"
> , "tstamp":"2013-04-01 00:08:55"}
> {"userid":"xyz", "machineid":"110530745", "path":"/sale/B", "country":"US"
> , "tstamp":"2013-04-01 00:09:46"}
> {"userid":"abc", "machineid":"110527022 ", "path":"/promo/A", "country":
> "DE", "tstamp":"2013-04-01 00:10:46"}
> 
> And we have 500+M such entries.
>
> We want a count of the number of userids that visited path=/sale/B after
> visiting path=/promo/A.
>
> What I did is to preprocess the data, sorting by , then
> compacting all events by the same userid into the same document. Then I
> wrote a script filter which traverses the path array per document, and
> returns true if it finds any occurrence of B followed by A. This however is
> inefficient. Most of our queries take 1 or 2 seconds on 100+M events. This
> script filter query takes over 300 seconds. Specifically, it can process
> events at about 400K events per second. BY comparison, I wrote a naive
> program that does a linear pass of the un-compacted data and that process
> 11M events per second. By which I conclude that Elasticsearch does not do
> well on this type of query.
>
> I am hoping someone can indicate a more efficient way to do this query in
> ES. Or else confirm that ES cannot do inter-document queries well.
>
> Thanks,
> Zennet
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/28c93f2d-e870-4347-8677-e9da41b6be62%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsCs2LnbYyz5sAc9CLDMqaHYDseQwS8mgsB4PepCsZHpw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Span first queries

2014-06-04 Thread Ivan Brusic
The limitation of only able to use term queries comes from Lucene. I never
looked into why there is such a limitation in Lucene, but since they have a
lot of smart people working on the code, I assume there must be a good
reason. :) Phrase queries do not have such a limitation.

I use span queries a lot where I require phrase queries with in order
terms. I pre-analyzed my queries by instantiating an AnalysisService
locally, which is only doable if using the Java API. Hackish, but it works.
Ultimately I need to move away from span queries.

Cheers,

Ivan


On Wed, Jun 4, 2014 at 1:37 PM, Nikolas Everett  wrote:

> Is there a way to perform a span_first query against a query_string or
> match query?  I'd like to use this in a rescore to improve relevance.
>
> If I can't do that I'll have to do something silly like create another
> field pre-chopped on the way into Elasticsearch.
>
> Nik
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd124LZ25Kc0cvoSpjkSc3-2nrq4FVk1MrqmFvp%3DxW%3DJug%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBOv7d5_yg0ei0Z%3DbE-oL-Ghxa-dMrV8Qqk09OT9TPTAQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Inter-document Queries

2014-06-04 Thread Zennet Wheatcroft
Hi,

I am looking for an efficient way to do inter-document queries in 
Elasticsearch. Specifically, I want to count the number of users that went 
through an exit point B after visiting point A.

In general terms, say we have some event log data about users actions on a 
website:

{"userid":"xyz", "machineid":"110530745", "path":"/promo/A", "country":"US", 
"tstamp":"2013-04-01 00:01:01"}
{"userid":"pdq", "machineid":"110519774", "path":"/page/1", "country":"CN", 
"tstamp":"2013-04-01 00:02:11"}
{"userid":"xyz", "machineid":"110530745", "path":"/promo/D", "country":"US", 
"tstamp":"2013-04-01 00:06:31"}
{"userid":"abc", "machineid":"110527022", "path":"/page/23", "country":"DE", 
"tstamp":"2013-04-01 00:08:00"}
{"userid":"pdq", "machineid":"110519774", "path":"/page/2", "country":"CN", 
"tstamp":"2013-04-01 00:08:55"}
{"userid":"xyz", "machineid":"110530745", "path":"/sale/B", "country":"US", 
"tstamp":"2013-04-01 00:09:46"}
{"userid":"abc", "machineid":"110527022 ", "path":"/promo/A", "country":"DE"
, "tstamp":"2013-04-01 00:10:46"}

And we have 500+M such entries.

We want a count of the number of userids that visited path=/sale/B after 
visiting path=/promo/A.

What I did is to preprocess the data, sorting by , then 
compacting all events by the same userid into the same document. Then I 
wrote a script filter which traverses the path array per document, and 
returns true if it finds any occurrence of B followed by A. This however is 
inefficient. Most of our queries take 1 or 2 seconds on 100+M events. This 
script filter query takes over 300 seconds. Specifically, it can process 
events at about 400K events per second. BY comparison, I wrote a naive 
program that does a linear pass of the un-compacted data and that process 
11M events per second. By which I conclude that Elasticsearch does not do 
well on this type of query.

I am hoping someone can indicate a more efficient way to do this query in 
ES. Or else confirm that ES cannot do inter-document queries well. 

Thanks,
Zennet


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/28c93f2d-e870-4347-8677-e9da41b6be62%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
Absolutely, agreed.

The docs are sparse in my simple plugin too. I try to find some time to add
sample code for all the variants and explain the differences.

Jörg


On Wed, Jun 4, 2014 at 6:22 PM, Ivan Brusic  wrote:

> Jörg, thanks for the plugin to help as a starting point for plugin
> development.
>
> Although I have built a few plugins during the years, they were river or
> analysis plugins, which are fairly easy. Writing a custom action required a
> lot more digging, especially since there are very few to learn from. I
> still would like to see a write-up regarding the different families of
> transport actions: BroadcastOperationRequest, MasterNodeOperationRequest, 
> NodesOperationRequest, SingleShardOperationRequest, 
> SingleCustomOperationRequest,
> etc. What is the difference? I understand it now, but it should be
> documented. There is little documentation about the internals and there are
> no code level comments.  I always meant to experiment with the different
> action hierarchies via simple plugins and document my findings. Perhaps one
> day...
>
> Cheers,
>
> Ivan
>
>
> On Wed, Jun 4, 2014 at 1:09 AM, joergpra...@gmail.com <
> joergpra...@gmail.com> wrote:
>
>> Sorry, the plugin is outdated, a better start is by looking at
>>
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-similarity.html
>>
>> Jörg
>>
>>
>> On Wed, Jun 4, 2014 at 10:07 AM, joergpra...@gmail.com <
>> joergpra...@gmail.com> wrote:
>>
>>> You need resources on all nodes that hold shards, you can not do it with
>>> just one instance, because ES index is distributed. Rescoring would be very
>>> expensive if you did it on an extra central instance with an extra
>>> scatter/gather phase. It is also very expensive in scripting.
>>>
>>> A better method is a similarity plugin like
>>> https://github.com/tlrx/elasticsearch-custom-similarity-provider
>>>
>>> Not sure how your code looks like though, maybe you can share it with
>>> the community?
>>>
>>> Jörg
>>>
>>>
>>>
>>> On Wed, Jun 4, 2014 at 2:55 AM, virgil  wrote:
>>>
 The problem is that only one copy of HashMap is needed to customize
 score of
 all documents in the cluster. But as we have to install the plugin on
 all
 nodes, the actual memory used is multiplied by the number of nodes in
 cluster. I try to figure out one way to save the memory. Tried on
 non-data
 node, but it seems not working.



 --
 View this message in context:
 http://elasticsearch-users.115913.n3.nabble.com/ANN-Elasticsearch-Simple-Action-Plugin-tp4056971p4057015.html
 Sent from the ElasticSearch Users mailing list archive at Nabble.com.

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/1401843345821-4057015.post%40n3.nabble.com
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHZTAZrAdtQAnvj_7UtO%3DaAVtN3qt337PTzDjnbCmtPaA%40mail.gmail.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCkOVMuEV67ZMCX5qoAdiob%2BfWsuWK%3D0EyAKf3VGhjYdQ%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGRDpkJcvWzYp1TH19j%3DWhe5XULn7eRDiPTXMPa1HR1NQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Span first queries

2014-06-04 Thread Nikolas Everett
Is there a way to perform a span_first query against a query_string or
match query?  I'd like to use this in a rescore to improve relevance.

If I can't do that I'll have to do something silly like create another
field pre-chopped on the way into Elasticsearch.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd124LZ25Kc0cvoSpjkSc3-2nrq4FVk1MrqmFvp%3DxW%3DJug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: iptablex trojan experiences?

2014-06-04 Thread 'Adolfo Rodriguez' via elasticsearch
sorry if you took the message personally. Is your problem, not mine. I was 
not attacking you at all, rather saying that, in my opinion, software 
should fit for purpose and either prevent (when feasible) or warn about 
possible security holes. Just that. But not building additional security 
features beyond purpose (as I understood Richard was suggesting). So, 
basically the same that you are stating.
 

> It is just ridiculous to read that running applications under superuser 
> privileges and allowing world-wide access over the internet to a host with 
> user applications need "safe configuration options by default" and 
> "unnecessary burden must be prevented".
>

well, is ridiculous if you are google and have 2000 employees to create a 
couple of servlets. But if you have limited resources, and you are paying 
attention to other functionalities and working on beta, is not ridiculous. 
Is an assumed and controlled risk.

But do not blame others for your personal mistakes.
>

Can you please show me where I did that? I totally agree what you did here 
. No more 
question here. Sorry, you have blamed yourself, I did not.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a72a0161-91c5-47b3-a989-7dd8548f996a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[ANN] Experimental highlighter 0.0.10 released

2014-06-04 Thread Nikolas Everett
I released version 0.0.10 of the Experimental Highlighter
 I've been working on.
Its compatible with Elasticsearch 1.2.x and brings some improvement to
phrase queries over the last version I announced, 0.0.8.  If you are using
0.0.8 you'll need to upgrade to 0.0.10 to support Elasticsearch 1.2.x.

If you highlight stuff its worth a look because it has lots of fun tricks
like preferring unique matches and optional full sentence segmentation.  If
highlighting is a large portion of your cluster's run time its worth a look
because it can be configured to be very fast.

Read more at the link if you are interested.

In this case it isn't (yet) on my beta site
 but will be shortly.

Cheers,


Nik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2t_7fcfMx2QWHmQJRZ4nZUm3ad1ShN574o581ZQMg%3DXw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: OutOfMemory Exception on client Node

2014-06-04 Thread VB
These are more log statements after GC.

[2014-06-04 14:47:12,939][INFO ][cluster.service  ] [BUS2F2801F3] 
master {new 
[ELS-10.76.121.131][dg_r12_nQbqIT_oJfjTwTg][inet[/10.76.121.131:9300]]{data=false,
 
max_local_storage_nodes=1, master=true}, previous 
[ELS-10.76.121.130][BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300]]{data=false,
 
max_local_storage_nodes=1, master=true}}, removed 
{[ELS-10.76.121.130][BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300]]{data=false,
 
max_local_storage_nodes=1, master=true},}, reason: zen-disco-master_failed 
([ELS-10.76.121.130][BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300]]{data=false,
 
max_local_storage_nodes=1, master=true})
[2014-06-04 14:48:03,969][WARN ][monitor.jvm  ] [BUS2F2801F3] 
[gc][old][55503][489] duration [49.6s], collections [1]/[49.9s], total 
[49.6s]/[4.5h], memory [9.9gb]->[9.9gb]/[9.9gb], all_pools {[young] 
[532.5mb]->[532.5mb]/[532.5mb]}{[survivor] 
[51.3mb]->[42.8mb]/[66.5mb]}{[old] [9.3gb]->[9.3gb]/[9.3gb]}
[2014-06-04 14:48:40,256][WARN ][monitor.jvm  ] [BUS2F2801F3] 
[gc][old][55504][490] duration [35.7s], collections [1]/[36.2s], total 
[35.7s]/[4.5h], memory [9.9gb]->[9.9gb]/[9.9gb], all_pools {[young] 
[532.5mb]->[532.5mb]/[532.5mb]}{[survivor] 
[42.8mb]->[58.6mb]/[66.5mb]}{[old] [9.3gb]->[9.3gb]/[9.3gb]}
[2014-06-04 14:49:30,335][WARN ][monitor.jvm  ] [BUS2F2801F3] 
[gc][old][55505][491] duration [49.9s], collections [1]/[50s], total 
[49.9s]/[4.5h], memory [9.9gb]->[9.9gb]/[9.9gb], all_pools {[young] 
[532.5mb]->[532.5mb]/[532.5mb]}{[survivor] 
[58.6mb]->[63.7mb]/[66.5mb]}{[old] [9.3gb]->[9.3gb]/[9.3gb]}
[2014-06-04 14:49:30,350][INFO ][discovery.zen] [BUS2F2801F3] 
master_left 
[[ELS-10.76.121.131][dg_r12_nQbqIT_oJfjTwTg][inet[/10.76.121.131:9300]]{data=false,
 
max_local_storage_nodes=1, master=true}], reason [failed to ping, tried [3] 
times, each with  maximum [30s] timeout]
[2014-06-04 14:49:30,865][WARN ][discovery.zen] [BUS2F2801F3] 
not enough master nodes after master left (reason = failed to ping, tried 
[3] times, each with  maximum [30s] timeout), current nodes: 
{[ELS-10.76.125.37][j3VQFYDaQLujkprUnke02w][inet[/10.76.125.37:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.122.38][5V8bqkEzTP2TzMukB5_j-Q][inet[/10.76.122.38:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.125.48][TGlF1uv8Q5GpgBVvIcvRAQ][inet[/10.76.125.48:9300]]{max_local_storage_nodes=1,
 
master=false},[EDSFB1ABF7][MqLDnM5mSLqIicIuyJk7IQ][inet[/10.76.122.19:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.120.62][evcNI2CqSs-Zz44Jdzn0aw][inet[/10.76.120.62:9300]]{client=true,
 
data=false, max_local_storage_nodes=1, 
master=false},[BUS9364B62][YZPjEsvhT6OjM9ti5Lxwkg][inet[/10.76.123.123:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.125.38][RyeswSy8SquV5H8Vfsw75Q][inet[/10.76.125.38:9300]]{max_local_storage_nodes=1,
 
master=false},[EDSFB1200C][XUNaWVlYQUOVZlJMv3nHMA][inet[/10.76.122.18:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.124.214][H8N9nIU0TKyGv_prKyRVCQ][inet[/10.76.124.214:9300]]{max_local_storage_nodes=1,
 
master=false},[EDS1A1F2240][ET2u1qImQCCvqc-1gRvQbQ][inet[/10.76.120.87:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.125.40][hp4wvQxER-mMPygey2Iqgg][inet[/10.76.125.40:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.122.67][BiXop5iCRgGQyGvxazMkQg][inet[/10.76.122.67:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.121.129][pf9xpva7Q4izIy6Nj4S4iQ][inet[/10.76.121.129:9300]]{data=false,
 
max_local_storage_nodes=1, 
master=true},[EDSFB21E69][RabnwdLbT1WCp9gIE-_AXw][inet[/10.76.122.20:9300]]{client=true,
 
data=false, 
master=false},[EDI1AE4FD76][UF1RMWe6RYaZGp6BU3x-VA][inet[/10.76.124.228:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.125.46][nXceQp40TjOSctChaGVtKw][inet[/10.76.125.46:9300]]{max_local_storage_nodes=1,
 
master=false},[EDI1A1EA928][rWlelgQuT7KHSfyIejmLPg][inet[/10.76.120.82:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.121.188][oWldDeY4TJioki90moNySw][inet[/10.76.121.188:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.122.34][kPSYm9G8R8i_z2skK_jq1g][inet[/10.76.122.34:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.125.43][JMgOIZFBSzaQZ9bVagG57w][inet[/10.76.125.43:9300]]{max_local_storage_nodes=1,
 
master=false},[EDI1AE3EE57][7JHGaYjzS3uI7PLN8Ynm-Q][inet[/10.76.124.227:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.124.225][nTPlE6IkTHOZ7EThX-hLeQ][inet[/10.76.124.225:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.120.61][_60f636_QsOPIWN0tKyN2A][inet[/10.76.120.61:9300]]{client=true,
 
data=false, max_local_storage_nodes=1, 
master=false},[ELS-10.76.125.47][MV8eSvpbRtCS1MAK2iAcVg][inet[/10.76.125.47:9300]]{max_local_storage_nodes=1,
 
master=false},[EDI1AB0123F][Di8rrVJMSYm6PVnAVuFnkw][inet[/10.76.124.18:9300]]{client=true,
 
data=false, 
master=false},[BUS936E1B3][Vnr_UCzOTtysB

Re: elasticsearch QueryBuilder with dynamic value in term query

2014-06-04 Thread Ivan Brusic
Off the top of my head, you can either using a nested bool query for the IP
address or use a terms query with the minimum match set to the size of the
list.

*Option 1:*

QueryBuilder ipQuery = QueryBuilders.boolQuery();
for (String ip: ipList) {
ipQuery.must(must(QueryBuilders.termQuery("address", ip));
}

QueryBuilder qb = QueryBuilders.boolQuery()
.must(ipQuery)
...


*Option 2:*

QueryBuilder qb = QueryBuilders.boolQuery()
.must(termsQuery("address" ,
ipList).minimumMatch(ipList.size()))
...

Cheers,

Ivan


On Wed, Jun 4, 2014 at 10:39 AM, Subhadip Bagui  wrote:

> I have a code like below where I'm doing multiple must in bool query. Here
> I'm passing the must term queries in field "address". Now the ip address
> will come to me as a list from other api and I have to pass for all the
> ip's in the list as a must term query. Here I'm not getting a way how to
> pass the address values dynamically when creating the QueryBuilder.
>
> Please suggest how to do this.
>
> public static SearchResponse searchResultWithAggregation(String es_index,
> String es_type, List ipList, String queryRangeTime) {
> Client client = ESClientFactory.getInstance();
>
> QueryBuilder qb = QueryBuilders.boolQuery()
> .must(QueryBuilders.termQuery("address", "10.203.238.138"))
> .must(QueryBuilders.termQuery("address", "10.203.238.137"))
> .must(QueryBuilders.termQuery("address", "10.203.238.136"))
> .mustNot(QueryBuilders.termQuery("address", "10.203.238.140"))
> .should(QueryBuilders.termQuery("client", ""));
>
> queryRangeTime = "now-" + queryRangeTime + "m";
> FilterBuilder fb = FilterBuilders.rangeFilter("@timestamp")
> .from(queryRangeTime).to("now");
>
> SearchResponse response = client
> .prepareSearch(es_index)
> .setTypes(es_type)
> .setQuery(qb)
> .setPostFilter(fb)
> .addAggregation(
> AggregationBuilders.avg("cpu_average").field("value"))
> .setSize(10).execute().actionGet();
>
> System.out.println(response.toString());
> return response;}
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1b6d7ba7-5cc5-4f26-abce-9e6614d39ed4%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAJ77dq-q7JMG41VPvikJjirOB1qtjmCDvaBu1HzZe0%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Shard count and plugin questions

2014-06-04 Thread Todd Nine
Hi All,
  We've been using elastic search as our search index for our new 
persistence implementation.  

https://usergrid.incubator.apache.org/

I have a few questions I could use a hand with.

1) Is there any good documentation on the upper limit to count of 
documents, or total index size, before you need to allocate more shards? 
 Do shards have a real world limit on size or number of entries to keep 
response times low?  Every system has it's limits, and I'm trying to find 
some actual data on the size limits.  I've been trolling Google for some 
answers, but I haven't really found any good test results.


2) Currently, it's not possible to increase the shard count for an index. 
The workaround is to create a new index with a higher count, and move 
documents from the old index into the new.  Could this be accomplished via 
a plugin?


3) We sometimes have "realtime" requirements.  In that when an index call 
is returned, it is available.  Flushing explicitly is not a good idea from 
a performance perspective.Has anyone explored searching in memory the 
documents that have not yet been flushed and merging them with the Lucene 
results?  Is this something that's feasible to be implemented via a plugin?

Thanks in advance!
Todd

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/940c6404-6667-4846-b457-977e705d3797%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread virgil
Yeah, but I would consider the nondata node is already doing the job. --
"These "non data" nodes are still part of the cluster, and they redirect
operations exactly to the node that holds the relevant data. The other
benefit is the fact that for scatter / gather based operations (such as
search), these nodes will take part of the processing since they will start
the scatter process, and perform the actual gather processing." I just
uploaded my native script code in https://github.com/virgil0/TestPlugin. It
works with the function score query. You can see that there are 3 bin file I
need to load into memory. Thank you for reply.



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/ANN-Elasticsearch-Simple-Action-Plugin-tp4056971p4057054.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1401905723480-4057054.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch QueryBuilder with dynamic value in term query

2014-06-04 Thread Subhadip Bagui


I have a code like below where I'm doing multiple must in bool query. Here 
I'm passing the must term queries in field "address". Now the ip address 
will come to me as a list from other api and I have to pass for all the 
ip's in the list as a must term query. Here I'm not getting a way how to 
pass the address values dynamically when creating the QueryBuilder.

Please suggest how to do this.

public static SearchResponse searchResultWithAggregation(String es_index,
String es_type, List ipList, String queryRangeTime) {
Client client = ESClientFactory.getInstance();

QueryBuilder qb = QueryBuilders.boolQuery()
.must(QueryBuilders.termQuery("address", "10.203.238.138"))
.must(QueryBuilders.termQuery("address", "10.203.238.137"))
.must(QueryBuilders.termQuery("address", "10.203.238.136"))
.mustNot(QueryBuilders.termQuery("address", "10.203.238.140"))
.should(QueryBuilders.termQuery("client", ""));

queryRangeTime = "now-" + queryRangeTime + "m";
FilterBuilder fb = FilterBuilders.rangeFilter("@timestamp")
.from(queryRangeTime).to("now");

SearchResponse response = client
.prepareSearch(es_index)
.setTypes(es_type)
.setQuery(qb)
.setPostFilter(fb)
.addAggregation(
AggregationBuilders.avg("cpu_average").field("value"))
.setSize(10).execute().actionGet();

System.out.println(response.toString());
return response;}


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1b6d7ba7-5cc5-4f26-abce-9e6614d39ed4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: What's using memory in ElasticSearch? (Details to follow...)

2014-06-04 Thread Adam Georgiou
Thanks for the response, Jörg.

The version is 1.1.0, and I'll take a look at that bloom filter setting.

-Adam

On Tuesday, June 3, 2014 3:48:37 PM UTC-4, Jörg Prante wrote:
>
> What ES version is this?
>
> Your segment count is very high (>1000) which is not efficient.
>
> Maybe index.codec.bloom.load: false can help reducing heap mem usage.
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-codec.html
>
> Jörg
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aeaf66df-76a0-41ab-a17d-f41a01473912%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


MapperParsingException on data that should not be parsed but caught by my plugin

2014-06-04 Thread Laurent T.
Hi,

I've just activated DEBUG mode on my ES logs and i'm seeing this kind of 
exception:

[2014-06-04 15:50:03,539][DEBUG][action.index ] [Supercharger] [
myplugin][0], node[drCfkhlURn2Yz_SsM6bD3w], [P], s[STARTED]: Failed to 
execute [index {[myplugin][client1][69Pb9C_kT6CF6jdH06WbJw], 
source[{"value":"Cr�py-en-valois"}]}]
org.elasticsearch.index.mapper.MapperParsingException: failed to parse 
[value]
 at 
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:396)
 at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:599)
 at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:467)
 at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:515)
 at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:457)
 at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:515)
 at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:457)
 at 
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:507)
 at 
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:451)
 at 
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:308)
 at 
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:211)
 at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:521)
 at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:419)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.common.jackson.core.JsonParseException: 
Invalid UTF-8 middle byte 0x70
 at [Source: [B@2d1b0cf9; line: 1, column: 11]
 at 
org.elasticsearch.common.jackson.core.JsonParser._constructError(JsonParser.java:1369)
 at 
org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:599)
 at 
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3004)
 at 
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3011)
 at 
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._decodeUtf8_3fast(UTF8StreamJsonParser.java:2833)
 at 
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2135)
 at 
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2084)
 at 
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:270)
 at 
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:85)
 at 
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:107)
 at 
org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:285)
 at 
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:385)
 ... 15 more

I'm wondering why this is happening and why this error is shown only in 
DEBUG mode.
This request should actually be targeting my plugin that would JSON-decode 
it using Google Gson.

Is ES doing anything else before forwarding the request to the plugin ?
We're using version 0.90.3 of ES.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f6afec0d-24f5-4438-b5d4-3e2d6b87c5c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hourly Shards Elasticsearch/Kibana

2014-06-04 Thread Antonio Augusto Santos
Hey There,

Did you remember to change the "Timestamping" on Kibana so that it would 
know you are using an hourly index ? Go the index configuration screen to 
see that.

Also, if you have the requirement for 24 hour roll out, did you try 
enabling _ttl 
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-ttl-field.html)
 
on your indices ? Like that the docs older than the specified time would be 
automatically deleted.

On Wednesday, June 4, 2014 12:16:56 PM UTC-3, Kellan Strong wrote:
>
> Hello All,
>
> I have a question about hourly sharding with either logstash or fluentd. 
> Since we are, or will be using, a set up called FLEKZ. I am trying to 
> integrate both logstash and fluentd together, which work well with each 
> other. However, I have a business requirement for a rolling 24hour shard 
> deletion.
>
> When I add
>
> logstash_dateformat %Y.%m.%d.%H 
>
> in fluentd and
>
> index => "logstash-%{+.MM.dd.HH}"
>
> into logstash.
>
> Elasticsearch cannot find the indices anymore. I go onto Kibana and they 
> cannot be found. I switch back to the normal Y.m.d in both and the 
> information is back on the screen. Using the api I am also not able to 
> search any of the indices. Is there something I am doing wrong or is there 
> something in the config file that I am missing?
>
> Thank you for your help,
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a1bceabb-ea26-4aa5-8358-92f6f8e2ae1e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread Ivan Brusic
Jörg, thanks for the plugin to help as a starting point for plugin
development.

Although I have built a few plugins during the years, they were river or
analysis plugins, which are fairly easy. Writing a custom action required a
lot more digging, especially since there are very few to learn from. I
still would like to see a write-up regarding the different families of
transport actions: BroadcastOperationRequest,
MasterNodeOperationRequest, NodesOperationRequest,
SingleShardOperationRequest, SingleCustomOperationRequest,
etc. What is the difference? I understand it now, but it should be
documented. There is little documentation about the internals and there are
no code level comments.  I always meant to experiment with the different
action hierarchies via simple plugins and document my findings. Perhaps one
day...

Cheers,

Ivan


On Wed, Jun 4, 2014 at 1:09 AM, joergpra...@gmail.com  wrote:

> Sorry, the plugin is outdated, a better start is by looking at
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-similarity.html
>
> Jörg
>
>
> On Wed, Jun 4, 2014 at 10:07 AM, joergpra...@gmail.com <
> joergpra...@gmail.com> wrote:
>
>> You need resources on all nodes that hold shards, you can not do it with
>> just one instance, because ES index is distributed. Rescoring would be very
>> expensive if you did it on an extra central instance with an extra
>> scatter/gather phase. It is also very expensive in scripting.
>>
>> A better method is a similarity plugin like
>> https://github.com/tlrx/elasticsearch-custom-similarity-provider
>>
>> Not sure how your code looks like though, maybe you can share it with the
>> community?
>>
>> Jörg
>>
>>
>>
>> On Wed, Jun 4, 2014 at 2:55 AM, virgil  wrote:
>>
>>> The problem is that only one copy of HashMap is needed to customize
>>> score of
>>> all documents in the cluster. But as we have to install the plugin on all
>>> nodes, the actual memory used is multiplied by the number of nodes in
>>> cluster. I try to figure out one way to save the memory. Tried on
>>> non-data
>>> node, but it seems not working.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://elasticsearch-users.115913.n3.nabble.com/ANN-Elasticsearch-Simple-Action-Plugin-tp4056971p4057015.html
>>> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/1401843345821-4057015.post%40n3.nabble.com
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHZTAZrAdtQAnvj_7UtO%3DaAVtN3qt337PTzDjnbCmtPaA%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCkOVMuEV67ZMCX5qoAdiob%2BfWsuWK%3D0EyAKf3VGhjYdQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Hourly Shards Elasticsearch/Kibana

2014-06-04 Thread Kellan Strong
Hello All,

I have a question about hourly sharding with either logstash or fluentd. 
Since we are, or will be using, a set up called FLEKZ. I am trying to 
integrate both logstash and fluentd together, which work well with each 
other. However, I have a business requirement for a rolling 24hour shard 
deletion.

When I add

logstash_dateformat %Y.%m.%d.%H 

in fluentd and

index => "logstash-%{+.MM.dd.HH}"

into logstash.

Elasticsearch cannot find the indices anymore. I go onto Kibana and they 
cannot be found. I switch back to the normal Y.m.d in both and the 
information is back on the screen. Using the api I am also not able to 
search any of the indices. Is there something I am doing wrong or is there 
something in the config file that I am missing?

Thank you for your help,

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0dc08e6-c570-4305-bc0b-808937551f54%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Marvel/Sense Troubleshooting

2014-06-04 Thread Fergie
Hi -

I'm previous Solr user (bison.usgs.ornl.gov) and followed the instructions 
for installing Marvel on Mac running 10.9.2.


Trying 
http://download.elasticsearch.org/elasticsearch/marvel/marvel-latest.zip...

Downloading 
...DONE

Installed elasticsearch/marvel/latest into 
/usr/local/var/lib/elasticsearch/plugins/marvel

Douglass-MacBook-Pro:bin douglaskunzman$ plugin -l

Installed plugins:

- marvel

I restarted elasticsearch and chrome and typed 
localhost:9200/_plugin/marvel/ and nothing happens.  Can someone offer some 
advice on how to troubleshoot?

I searched the elasticsearch site and this list for this information and 
can't find anything?

Doug

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/489bef92-33ff-41f0-8d0e-334f355d170f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch upgrade procedure

2014-06-04 Thread Aldian
Thanks you very much for the detailed reply. I will proceed this way.
Aldian


2014-06-04 16:15 GMT+02:00 Nikolas Everett :

>
>
>
> On Wed, Jun 4, 2014 at 10:03 AM, Aldian  wrote:
>
>> Hi
>>
>> I am currently using elasticsearch 1.0.2. Since I have a memory leak
>> problem that forces me to restart it every 5 or 6 days, I am considering
>> upgrading to 1.2.1. But I did not found any upgrade guide, nor indications
>> whether new versions were retro-compatible. Please indicate if there is any
>> risk for the data?
>>
>
>
> The data is compatible.  Sometimes, like when Elasticsearch jumps a major
> version, the protocol that Elasticsearch servers use to communicate with
> eachother isn't backwards compatible.  This is rare, and the 1.0.2->1.2.1
> upgrade is fully backwards compatible.  That being said, there are some
> issues with running a cluster with two versions of Elasticsearch: you
> (mostly) can't move data from a newer node to an older node.  Features of
> the new version may degrade because they aren't getting what they need from
> their peers who are on the old version.  This is generally OK because you
> won't be relying on features of the newer version until after the upgrade.
>
> The upshot: a rolling upgrade is safe for you.  Turn off shard assignment
> on the cluster, update on node, restart elasticsearch on that node, turn
> shard assignment back on, wait for the cluster to go green, repeat.
>
> Make sure to upgrade any plugins that you have installed when you upgrade
> Elasticsearch.
>
> Nik
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/wF-WCPLUdQo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3NB5x4nydx6N3ggAhqeWh%2B4LTrj_7L2ZkqYh4FCu%2BY0w%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Cordialement,

Aldian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAECUaLxBANfsPMUq6U%2B5GPVEttMNVNUzAjYtuuGJv-0aAPMZWQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch upgrade procedure

2014-06-04 Thread Nikolas Everett
On Wed, Jun 4, 2014 at 10:03 AM, Aldian  wrote:

> Hi
>
> I am currently using elasticsearch 1.0.2. Since I have a memory leak
> problem that forces me to restart it every 5 or 6 days, I am considering
> upgrading to 1.2.1. But I did not found any upgrade guide, nor indications
> whether new versions were retro-compatible. Please indicate if there is any
> risk for the data?
>


The data is compatible.  Sometimes, like when Elasticsearch jumps a major
version, the protocol that Elasticsearch servers use to communicate with
eachother isn't backwards compatible.  This is rare, and the 1.0.2->1.2.1
upgrade is fully backwards compatible.  That being said, there are some
issues with running a cluster with two versions of Elasticsearch: you
(mostly) can't move data from a newer node to an older node.  Features of
the new version may degrade because they aren't getting what they need from
their peers who are on the old version.  This is generally OK because you
won't be relying on features of the newer version until after the upgrade.

The upshot: a rolling upgrade is safe for you.  Turn off shard assignment
on the cluster, update on node, restart elasticsearch on that node, turn
shard assignment back on, wait for the cluster to go green, repeat.

Make sure to upgrade any plugins that you have installed when you upgrade
Elasticsearch.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3NB5x4nydx6N3ggAhqeWh%2B4LTrj_7L2ZkqYh4FCu%2BY0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch upgrade procedure

2014-06-04 Thread Aldian
Hi

I am currently using elasticsearch 1.0.2. Since I have a memory leak 
problem that forces me to restart it every 5 or 6 days, I am considering 
upgrading to 1.2.1. But I did not found any upgrade guide, nor indications 
whether new versions were retro-compatible. Please indicate if there is any 
risk for the data?

Cheers
Aldian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6e87e53e-c041-40b0-bf34-a1bc6a3b2ee1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


how to design dynamic embeded mapping

2014-06-04 Thread asoqa

   I have a mindmap json , as follows:
 
{
  "root": {
"id": "cpkbvyde",
"text": "My Mind Map",
"layout": "map",
"children": [
  {
"id": "lqlpxqim",
"text": "a",
"side": "right",
"children": [
  {
"id": "uzgocffn",
"text": "1",
"value": 1
  }
]
  },
  {
"id": "bjmbsize",
"text": "b",
"side": "left"
  }
]
  },
  "id": "nztlnuln"
}


As you know, mind map files have uncertain levels of childrens. If I 
want to create a mapper according to the json , how do i design the mapper?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0ef8a8b9-77df-4b7c-bc6d-6133a0be587f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Best cluster environment for search

2014-06-04 Thread Marcelo Paes Rech
Hi Jörg. Thanks for your reply.

Here is my filter.
{"filter":
{
  "terms" : {
"_id" : [ "QSxrbEM8TKe5zr8931xBjA", "wj63ghegRwC6qLsWq2chkA", 
"hYEhDbAqQwSRxhYfvDgFkg", "4bZmPE1fTYqijphRyyWiuQ", 
"Fhq53yYyT3CEw6vclKu_NA", "XL2atBraTEyx57MefjFVhA", 
"951i0dZkT064FlQkzHnnWA", "O8Ixbir1TrGT_IA3wKfsHg", 
"8k4U7KsuTmsThqxy-5YaKw", "GNOoQTHglf22kzcE7EOf8g", 
"-RQeY48fTg2kYnh2M4E1cQ", "u8DGBdfVR9WRVj6d9E4Ebw", 
"WFHSXd7UQvCMYFBhFcTsng", "qnQ7q7FyTsg397lM1EWgqA", 
"wRQtUzdMRy2qOkMCNxdpgA", "Ll83iglxSUS_Gs7mjkMt8w", 
"d2sxZ1oBTfuvAfov5EJ0iw", "cyht-vB4Q-mMSg9N5jcGXg", 
"bNSVaO47QTOCkfJhWo0qjg", "BHuhm55IRerKnynJ8WgFTw", 
"fHKA4PF2QteWm8E7dW7CAw", "DLE6A7tyQJ-zcKcCa6IPSA", 
"qfelTW7-SuGRQ0GKbngARA", "R7VHHJhYsUqfuxYof8BJ8w", 
"W4PqiJfPSlSFjVKFsGkA4Q", "Juq62zOsRdheuW3O6Gb2KA", 
"U9v0IKj_RrgRNjE31ZTt2g", "uNHa0kOOT5qjPpzxZcs35A", 
"SwOgVNgIRwyVU3pEEycBuQ", "LaEpxFGIQgCArsNZ2rd4Pw", 
"CiJ9gouZsbmTtxTWx7w6lA", "TaQV_I01RfCq3B6uAtIBoQ", 
"9Jpjo5k-RlGfLVLF6nDgze", "57YpjRdASsrrae-RD3spog", 
"bmA4EWFSTiKUaDzaNcCFKQ", "Fui9z_UbRe6AY1VhAr8Crw", 
"2PORr5BzSDOmBXgmQkO5Zg", "snfwTmtuTv-uj5mOWSJpgA", 
"0nHIrtePSaeW8aWArh_Mrg", "s0g9QHnjTgWX3rCIu1g0Hg", 
"Jl67fACuQvCFgZxXAFtDOg" ],
"_cache" : true,
"_cache_key" : "my_terms_cache"
  }
}
}

I already used "*ids filter*" but I got same behaviour. One thing that I 
realized is that one of the cluster's nodes is increasing the Search Thread 
Pool (something like Queue: 50 and Count: 47) and the others don't 
(something like Queue: 0 and Count: 1). If I remove this node from the 
cluster another one starts with the same problem.

My current environment is:
- 7 Data nodes with 16Gb (8Gb for ES)and 8 cores each one;
- 4 Load balancer Nodes (no data, no master)  with 4Gb (3Gb for ES) and 8 
cores each one;
- 4 MasterNodes (only master, no data)  with 4Gb (3Gb for ES) and 8 cores 
each one;
- Thread Pool Search 47 (the others are standard config);
- 7 Shards and 2 replicas Index;
- 14.6Gb Index size (14.524.273 documents);


I'm executing this filter with 50 concurrent users.


Regards

Em terça-feira, 3 de junho de 2014 20h33min45s UTC-3, Jörg Prante escreveu:
>
> Can you show your test code?
>
> You seem to look at the wrong settings - by adjusting node number, shard 
> number, replica number alone, you can not find out the maximum node 
> performance. E.g. concurrency settings, index optimizations, query 
> optimizations, thread pooling, and most of all, fast disk subsystem I/O is 
> important.
>
> Jörg
>
>
> On Wed, Jun 4, 2014 at 12:18 AM, Marcelo Paes Rech  > wrote:
>
>> Thanks for your reply Nikolas. It helps a lot.
>>
>> And about the quantity of documents of each shard, or size of each shard. 
>> And the need of no data nodes or only master nodes. When is it necessary?
>>
>> Some tests I did, when I increased request's number (like 100 users at 
>> same moment, and redo it again and again), 5 nodes with 1 shard and 2 
>> replicas each and 16Gb RAM (8Gb for ES and 8Gb for OS) weren't enough. The 
>> response time start to increase more than 5s (I think less than 1s,  in 
>> this case, would be acceptable) .
>>
>> This test has a lot of documents (something like 14 millions).
>>
>>
>> Thanks. Regards.
>>
>> Em segunda-feira, 2 de junho de 2014 17h09min04s UTC-3, Nikolas Everett 
>> escreveu:
>>
>>>
>>>
>>>
>>> On Mon, Jun 2, 2014 at 3:52 PM, Marcelo Paes Rech  
>>> wrote:
>>>
>>> Hi guys,

 I'm looking for an article or a guide for the best cluster 
 configuration. I read a lot of articles like "change this configuration" 
 and "you must create X shards per node" but I didn't saw nothing like 
 ElasticSearch Official guide for creating a cluster.

 What I would like to know are informations like. 
 - How to calculate how many shards will be good for the cluster.
 - How many shards do we need per node? And if this is variable, how do 
 I calculate this?
 - How much memory do I need per node and how many nodes?

 I think ElasticSearch is well documentated. But it is very fragmented.



>>> For some of these that is because "it depends" is the answer.  For 
>>> example, you'll want larger heaps for aggregations and faceting.
>>>
>>> There are some rules of thumb:
>>> 1.  Set Elasticsearch's heap memory to 1/2 of ram but not more then 
>>> 30GB.  Bigger then that and the JVM can't do pointer compression and you 
>>> effectively lose ram.
>>> 2.  #1 implies that having much more then 60GB of ram on each node 
>>> doesn't make a big difference.  It helps but its not really as good as 
>>> having more nodes.
>>> 3.  The most efficient efficient way of sharding is likely one shard on 
>>> each node.  So if you have 9 nodes and a replication factor of 2 (so 3 
>>> total copies) then 3 shards is likely to be more efficient then having 2 or 
>>> 4.  But this only really matters when those shards get lots of traffic.  
>>> And it breaks down a bit when you get lots of nodes.  And the in presence 
>>> of routing.  Its co

Reindex after adding char_filter

2014-06-04 Thread Kirill Teplinskiy
Hello everyone!

I can't understand do I need to reindex documents after adding char_filter 
to existing analyzer or not.  My experiment shows char_filter starts to 
work without reindex and this is surprise for me.  Can anyone explain how 
it works please?  Here are steps of my experiment, I use ElasticSearch 
0.90.13.

0.  Configure analyzer:

test1 :
  tokenizer : standard
  filter : [lowercase]

1.  Add thousand of documents to index:

{"name": "apple 0"}
{"name": "apple 1"}
{"name": "apple 2"}
...
{"name": "apple 999"}

Field "name" is analyzed by analyzer "test1".

2.  Add mapping char_filter to analyzer and restart ElasticSearch.  This is 
char_filter configuration:

char_filter :
  ab:
type: mapping
mappings: ["a=>b"]

3.  Query with text "bplle" returns results (and "apple" too)!

{
  "query": {
"match": {
  "name": "bpple"
}
  }
}

Moreover, when I remove char_filter from analyzer it continues to find 
"bplle" but stops find "apple".  This confuses me too.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/312275a4-4f34-466a-a97c-8bb0d752822a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch filter query with time range using java api

2014-06-04 Thread Subhadip Bagui
Hi,

I have a document like below

{
"_index": "cpu_usage_metrics",
"_type": "cpu_usage_metrics",
"_id": "CKAAs1n8TKiR6FncC5NLGA",
"_score": 1,
"_source": {
   "status": 0,
   "occurrences": 1,
   "value": "33",
   "key": "vm.server2.cpu.usage",
   "client": "vm.server2",
   "@timestamp": "2014-06-03T20:18:19+05:30",
   "check_name": "cpu_usage_metrics",
   "address": "10.203.238.138",
   "command": "cpu-usage-metrics.sh"
}
 }

 I want to do a filtered query with time range using java api like below

"filter": {
"range": {
"@timestamp": {
 "to": "now",
"from": "now - 5mins"
}
}
}


Please suggest how to form the Filter in java api.

Thanks,
Subhadip

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f596658e-5ec4-42ac-abc1-4f99416be101%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation bug? Or user error?

2014-06-04 Thread mooky
Bah.
I thought I had a simple unit test that was reliably recreating it - but it 
would appear not. Its still very intermittent - and my test never seems to 
fail when run on its own.



On Tuesday, 3 June 2014 21:41:04 UTC+1, Adrien Grand wrote:
>
> A recreation would be really great! If you can zip it and upload it to any 
> file sharing service, that would work for me.
>
>
> On Tue, Jun 3, 2014 at 6:41 PM, mooky > 
> wrote:
>
>>
>> By the way this test fails with elastic 1.2 also.
>>
>> How do I go about uploading an index with aggregation request json, etc?
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/2284bf7f-5561-40d6-a430-08b4dbbaca00%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Adrien Grand
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1286f2ed-fbce-4145-834e-a579bcf84cb1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation vs Search/Filter discrepancy - caching issue?

2014-06-04 Thread mooky
Turns out it was user error. Please ignore.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/81ca5558-4fd3-44cd-ae72-4490145fa905%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: iptablex trojan experiences?

2014-06-04 Thread Mark Walkom
Containers, or VMs are also a valid approach to limiting access and
potential breaches.

Like all security, it's a multi-layered approach.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 4 June 2014 19:57, joergpra...@gmail.com  wrote:

> One very essential feature, from the very beginning, is that Elasticsearch
> instances, when started, automatically form a cluster over the network.
>
> This is only possible in an open network environment and by having
> multicast enabled.
>
> Are you aware, that by talking about "safe" configuration options "by
> default", you no longer can expect Elasticsearch to form a cluster? And
> that others would have to suffer from that?
>
> If you want security, you can not do this simply by adding "security
> modules" or by "safe" configuration options: it's always the responsibility
> and the awareness of the admin in person to run and maintain the software
> in a protected environment.
>
> It is just ridiculous to read that running applications under superuser
> privileges and allowing world-wide access over the internet to a host with
> user applications need "safe configuration options by default" and
> "unnecessary burden must be prevented".
>
> This is open source. Use the power of it. But do not blame others for your
> personal mistakes.
>
> Jörg
>
>
>
>
> On Wed, Jun 4, 2014 at 11:34 AM, 'Adolfo Rodriguez' via elasticsearch <
> elasticsearch@googlegroups.com> wrote:
>
>>
>> *> ES with absolutely no security features*
>>
>>
>>> *However, I think software should fit for purpose and delegate security
>>> in other specialized programs.*
>>>
>>
>> just to clarify, I think there is not need of any additional security
>> modules but, I agree that, any configuration option must be safe by
>> default. And if any additional module is provided, make it optional to
>> prevent unnecessary burden
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/79a862e2-713b-4c05-821f-70f505a6ee60%40googlegroups.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHhDcwQxuhMpjO%2BX0sGe9wkmRRhkqQDwWo5nZ-WWvh_-A%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YjRikoQ4xdo05X1LNy3BSAiRbUmL1Mg5xw3qQTZJ%2Bcwg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: iptablex trojan experiences?

2014-06-04 Thread joergpra...@gmail.com
One very essential feature, from the very beginning, is that Elasticsearch
instances, when started, automatically form a cluster over the network.

This is only possible in an open network environment and by having
multicast enabled.

Are you aware, that by talking about "safe" configuration options "by
default", you no longer can expect Elasticsearch to form a cluster? And
that others would have to suffer from that?

If you want security, you can not do this simply by adding "security
modules" or by "safe" configuration options: it's always the responsibility
and the awareness of the admin in person to run and maintain the software
in a protected environment.

It is just ridiculous to read that running applications under superuser
privileges and allowing world-wide access over the internet to a host with
user applications need "safe configuration options by default" and
"unnecessary burden must be prevented".

This is open source. Use the power of it. But do not blame others for your
personal mistakes.

Jörg




On Wed, Jun 4, 2014 at 11:34 AM, 'Adolfo Rodriguez' via elasticsearch <
elasticsearch@googlegroups.com> wrote:

>
> *> ES with absolutely no security features*
>
>
>> *However, I think software should fit for purpose and delegate security
>> in other specialized programs.*
>>
>
> just to clarify, I think there is not need of any additional security
> modules but, I agree that, any configuration option must be safe by
> default. And if any additional module is provided, make it optional to
> prevent unnecessary burden
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/79a862e2-713b-4c05-821f-70f505a6ee60%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHhDcwQxuhMpjO%2BX0sGe9wkmRRhkqQDwWo5nZ-WWvh_-A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Retrieve similar documents

2014-06-04 Thread Jean Maynier
Hi,

In Kibana, I would like to do a kind of join to retrieve documents that 
have a specific value for a field, plus documents that I want to join to 
those matched documents on another field.

Is it possible to do that kind of query ?
Thanks,

Jean

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/46ca956a-4c8f-446d-927c-823118591fed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: iptablex trojan experiences?

2014-06-04 Thread 'Adolfo Rodriguez' via elasticsearch

*> ES with absolutely no security features*
 

> *However, I think software should fit for purpose and delegate security in 
> other specialized programs.*
>

just to clarify, I think there is not need of any additional security 
modules but, I agree that, any configuration option must be safe by 
default. And if any additional module is provided, make it optional to 
prevent unnecessary burden

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79a862e2-713b-4c05-821f-70f505a6ee60%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Prefix search on integer field

2014-06-04 Thread Simon Cast
Hi,

I'm trying to use a prefix search on an integer field that is stored and 
not analysed. From the documentation I would have expected that sending 1 
would return all numbers starting with 1 but that doesn't seem to be the 
case.

Does the prefix search work on integer fields? 

Regards,

Simon

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/110f1887-c6e0-4f77-9ece-689f0a42b306%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana 3: display the number of items in a Text panel?

2014-06-04 Thread Nitsan Seniak
Hello,

In a Kibana 3 Text panel, is it possible to display the number of lines?

My use case is that I want to display a table with unique IDs extracted 
from log entries, and display the number of these unique IDs.

Thanks,

-- Nitsan


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79d81d20-dc99-45b7-9480-9318e083%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: iptablex trojan experiences?

2014-06-04 Thread 'Adolfo Rodriguez' via elasticsearch
This is exactly the kind of things I was planning for my next deployment: a 
jail and finer permission tuning (besides closing the port and changing 
flag configuration). Exactly I was running ES libraries as root embedded in 
a Tomcat app.

However, I think software should fit for purpose and delegate security in 
other specialized programs. Making the specific warnings, this policy looks 
ok to me.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d92b6677-35a3-4fec-b19f-813e854fce86%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reverse nested aggregation parsing error

2014-06-04 Thread slagraulet
I tried with this morning build and the filter aggregation now works on top 
of the top_tag_hits aggregation.
We plan to thoroughly test this feature as we need it to be "feature 
complete".

Thank you.


Le vendredi 30 mai 2014 12:43:50 UTC+2, Martijn v Groningen a écrit :
>
> Hi Stephan,
>
> The bug has been fixed, can you try out a new build?
>
> Martijn
>
>
> On 30 May 2014 09:45, Martijn v Groningen  > wrote:
>
>> This is indeed a bug, thanks for sharing it! This should be easy to fix.
>>  
>>
>> On 27 May 2014 11:35, > wrote:
>>
>>> Hello,
>>> I tried the new top_hots aggregation and made it work on denormalized 
>>> data.
>>>
>>> However, when I tries to add a filter I ran into the following exception:
>>>
>>> [2014-05-27 11:32:12,869][DEBUG][action.search.type   ] [Cap 'N 
>>> Hawk] failed to reduce search
>>> org.elasticsearch.action.search.ReduceSearchPhaseException: Failed to 
>>> execute phase [fetch], [reduce]
>>> at 
>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.finishHim(TransportSearchQueryThenFetchAction.java:141)
>>> at 
>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction$1.onResult(TransportSearchQueryThenFetchAction.java:113)
>>> at 
>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction$1.onResult(TransportSearchQueryThenFetchAction.java:107)
>>> at 
>>> org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:526)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.lang.NullPointerException
>>> at 
>>> org.apache.lucene.search.TopDocs$ScoreMergeSortQueue.(TopDocs.java:89)
>>> at org.apache.lucene.search.TopDocs.merge(TopDocs.java:219)
>>> at org.apache.lucene.search.TopDocs.merge(TopDocs.java:209)
>>> at 
>>> org.elasticsearch.search.aggregations.bucket.tophits.InternalTopHits.reduce(InternalTopHits.java:107)
>>> at 
>>> org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:146)
>>> at 
>>> org.elasticsearch.search.aggregations.bucket.InternalSingleBucketAggregation.reduce(InternalSingleBucketAggregation.java:82)
>>> at 
>>> org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:146)
>>> at 
>>> org.elasticsearch.search.aggregations.bucket.terms.InternalTerms$Bucket.reduce(InternalTerms.java:77)
>>> at 
>>> org.elasticsearch.search.aggregations.bucket.terms.InternalTerms.reduce(InternalTerms.java:157)
>>> at 
>>> org.elasticsearch.search.aggregations.bucket.terms.InternalTerms.reduce(InternalTerms.java:37)
>>> at 
>>> org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:146)
>>> at 
>>> org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:386)
>>> at 
>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.innerFinishHim(TransportSearchQueryThenFetchAction.java:152)
>>> at 
>>> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.finishHim(TransportSearchQueryThenFetchAction.java:139)
>>> ... 6 more
>>>
>>>
>>> Here are the gists used to create the data:
>>>
>>>- mapping : https://gist.github.com/stephlag/f9402c699374438c3ede 
>>>- data: https://gist.github.com/stephlag/ecb5b2dc384a3f07602d
>>>- queries: https://gist.github.com/stephlag/7ad8edf1ab6b757be3b3 
>>>
>>>
>>>
>>> Le vendredi 23 mai 2014 18:05:18 UTC+2, slagr...@ippon.fr a écrit :
>>>
 Thank you
 I see this pull request is now merged on the 1.3 branch so I will try 
 this feature to see if it fits our needs.

 Le vendredi 23 mai 2014 15:49:51 UTC+2, Adrien Grand a écrit :
>
> You might want to do the innermost aggregation using the _uid field 
> instead of the _id field since the latter is not indexed by default. The 
> next version of Elasticsearch will feature a new aggregation called 
> top_hits (https://github.com/elasticsearch/elasticsearch/pull/6124) 
> that will allow to get the top hits per bucket. I think this is what you 
> are looking for?
>
>
> On Fri, May 23, 2014 at 3:26 PM,  wrote:
>
>> Hello David,
>>
>> The query you gave me is correct, I don't have the parsing error 
>> anymore.
>>
>> Unfortunately, it does not give the result I expected.
>> What I'm trying to get is a minimum price for each of the products, 
>> which is an aggregation for each of the products retrieved from the 
>> query.
>>
>> Do you have any advice on how to achieve this ?
>>
>

Re: iptablex trojan experiences?

2014-06-04 Thread Patrick Proniewski
On 04 juin 2014, at 05:38, 'Adolfo Rodriguez' via elasticsearch wrote:

> here is some sample code on how to exploit the system for version <1.2.0, 
> port 9200 exposed to internet and flag setting script.disable_dynamic=false 
> as is by default 
> 
> http://bouk.co/blog/elasticsearch-rce/#how_to_secure_against_this_vulnerability


I've had a great deal of fun reading this. And I'm really concerned that in 
2014 people are still developing products like ES with absolutely no security 
features.
This blogger should have added a word of warning about running ES as 
root/admin, I'm pretty sure most developers are running ES with their admin 
account, or even with root. Use a dedicated user account for the ES process, 
with very limited permissions and powers. Always think about privilege 
separation before you install a new software.
ES should really be quarantined. On FreeBSD, one can use a jail (very easy 
nowadays with ZFS and ezjail). I'm pretty sure similar things exist for Linux.
If you have the guts, go with SELinux. Requires some work, but it's rewarding 
and it has some pretty dam' cool things inside.

Patrick

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8C53A03A-BBB9-4450-86CF-562BC1E45CD1%40patpro.net.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
Sorry, the plugin is outdated, a better start is by looking at

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-similarity.html

Jörg


On Wed, Jun 4, 2014 at 10:07 AM, joergpra...@gmail.com <
joergpra...@gmail.com> wrote:

> You need resources on all nodes that hold shards, you can not do it with
> just one instance, because ES index is distributed. Rescoring would be very
> expensive if you did it on an extra central instance with an extra
> scatter/gather phase. It is also very expensive in scripting.
>
> A better method is a similarity plugin like
> https://github.com/tlrx/elasticsearch-custom-similarity-provider
>
> Not sure how your code looks like though, maybe you can share it with the
> community?
>
> Jörg
>
>
>
> On Wed, Jun 4, 2014 at 2:55 AM, virgil  wrote:
>
>> The problem is that only one copy of HashMap is needed to customize score
>> of
>> all documents in the cluster. But as we have to install the plugin on all
>> nodes, the actual memory used is multiplied by the number of nodes in
>> cluster. I try to figure out one way to save the memory. Tried on non-data
>> node, but it seems not working.
>>
>>
>>
>> --
>> View this message in context:
>> http://elasticsearch-users.115913.n3.nabble.com/ANN-Elasticsearch-Simple-Action-Plugin-tp4056971p4057015.html
>> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/1401843345821-4057015.post%40n3.nabble.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHZTAZrAdtQAnvj_7UtO%3DaAVtN3qt337PTzDjnbCmtPaA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
You need resources on all nodes that hold shards, you can not do it with
just one instance, because ES index is distributed. Rescoring would be very
expensive if you did it on an extra central instance with an extra
scatter/gather phase. It is also very expensive in scripting.

A better method is a similarity plugin like
https://github.com/tlrx/elasticsearch-custom-similarity-provider

Not sure how your code looks like though, maybe you can share it with the
community?

Jörg



On Wed, Jun 4, 2014 at 2:55 AM, virgil  wrote:

> The problem is that only one copy of HashMap is needed to customize score
> of
> all documents in the cluster. But as we have to install the plugin on all
> nodes, the actual memory used is multiplied by the number of nodes in
> cluster. I try to figure out one way to save the memory. Tried on non-data
> node, but it seems not working.
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/ANN-Elasticsearch-Simple-Action-Plugin-tp4056971p4057015.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1401843345821-4057015.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH%3D228Y2PvB265Hs4NX1O_Ac4QBuWXJGcCqKaXFc3a56A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.