Re: Help with ES 1.x percolator query plz

2014-05-06 Thread JGL
Can anybody help plz?

On Tuesday, May 6, 2014 11:53:32 AM UTC+12, JGL wrote:
>
>
> Can anybody help plz?
>
> On Monday, May 5, 2014 10:24:09 AM UTC+12, JGL wrote:
>>
>>
>> Hi Martjin,
>>
>> The percolator query in the 1st post above is what we registered to the 
>> percolator and kinda working, which consolidate all IDs in one query string 
>> for a match query, which seems not quite a elegant solution to us. 
>>
>> {
>>   "_index" : "my_idx",
>>   "_type" : ".percolator",
>>   "_id" : "my_query_id",
>>   "_score" : 1.0, 
>>   "_source" : {
>> "query":{
>>"match":{
>>   "id":{
>>   "query":"id1 id2 id3",
>>   "type":"boolean"
>>}
>>}
>> }
>>   }
>> }
>>
>>
>> Another issue is that the above solution is not quite accurate when the 
>> IDs are UUIDs. For example, if the query we register is as the following
>>
>> {
>>   "_index" : "my_idx",
>>   "_type" : ".percolator",
>>   "_id" : "my_query_id",
>>   "_score" : 1.0, 
>>   "_source" : {
>> "query":{
>>"match":{
>>   "id":{
>>   
>> "query":"1aa808dc-48f0-4de3-8978-*a0293d54b852* 
>> 6b256fd1-cd04-4e3c-8f38-aaa87ac2220d 1234fd1a-cd04-4e3c-8f38-aaa87142380d",
>>   "type":"boolean"
>>}
>>}
>> }
>>   }
>> }
>>
>>
>> , the percolator return the above query as a match if the document we try 
>> to percolate is "{"doc" : {"id":"1aa808dc-48f0-4de3-8978-*00293d54b852*"}}", 
>> though we are expecting a no match response here as the id in the document 
>> does not have a matched ID in the query String. 
>>
>> Such false positive response, according to the experimentations we had, 
>> happens when the doc UUID is almost the same to one of the IDs in the query 
>> except the the last part of ID. Wondering if there is an explanation for 
>> such behavior of elasticsearch?
>>
>> Our another question is if there is any way we could put the UUID list as 
>> a list into a query that is working with the percolator, like what we can 
>> do for inQuery or inFilter. We tried register an inQuery or a query 
>> wrapping an inFilter. Non of them can work with the percolator, seems the 
>> percolator only works with the MatchQuery, in which we cannot put the UUID 
>> list as a list.
>>
>> For example the following two queries we tried are not working with 
>> percolator:
>>
>> {
>>   "_index" : "my_idx",
>>   "_type" : ".percolator",
>>   "_id" : "inQuery",
>>   "_score" : 1.0, "_source" : 
>> {"query":{"terms":{"id":["1aa808dc-48f0-4de3-8978-a0293d54b852","6b256fd1-cd04-4e3c-8f38-aaa87ac2220d"]}}}
>> },
>>
>>
>> {
>>   "_index" : "my_idx",
>>   "_type" : ".percolator",
>>   "_id" : "inFilterQ",
>>   "_score" : 1.0, "_source" : 
>> {"query":{"filtered":{"query":{"match_all":{}},"filter":{"terms":{"id":["1aa808dc-48f0-4de3-8978-a0293d50b852","6b256fd1-cd04-4e3c-8f38-aaa87ac2220d"]}
>> }, 
>>
>> Thanks for your help!
>>
>> Jason
>>
>>
>> On Friday, May 2, 2014 7:34:47 PM UTC+12, Martijn v Groningen wrote:
>>>
>>> Hi,
>>>
>>> Can you share the stored percolator queries and the percolate request 
>>> that you were initially trying with, but didn't work?\
>>>
>>> Martijn
>>>
>>>
>>> On 2 May 2014 11:14, JGL  wrote:
>>>
 Can anybody help plz?

 -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/4ee60836-1922-43e0-8d9b-64ef9bb0b00a%40googlegroups.com
 .

 For more options, visit https://groups.google.com/d/optout.

>>>
>>>
>>>
>>> -- 
>>> Met vriendelijke groet,
>>>
>>> Martijn van Groningen 
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0c0e5bf8-8790-48f3-9ce3-16b3764057b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unable to create mapping and settings using Java API

2014-05-06 Thread Amit Soni
Actually I am using Java API which uses a node client (do not have http
port open). Finally I got it working but it look a bit amount of research.
Following is the code snippet:

CreateIndexRequest indexCreateRequest = new
CreateIndexRequest(indexName);
indexCreateRequest.source(mappingSource); //entire JSON scrpit
//Execute index creation command
CreateIndexResponse indexCreateResponse =
client.admin().indices().create(indexCreateRequest).actionGet();

-Amit.


On Fri, May 2, 2014 at 3:46 AM, Michael McCandless <
michael.mccandl...@elasticsearch.com> wrote:

> Hmm, I'm able to create an index and its mappings/settings with a single
> JSON request to http://localhost:9200/.
>
> What settings are you trying to set?
>
> Mike
>
> http://blog.mikemccandless.com
>
>
> On Thu, May 1, 2014 at 5:10 PM, Amit Soni  wrote:
>
>> hello everyone - I have settings and mapping defined in a single JSON
>> document and I have been trying to find a way to create index using that
>> JSON document. I tried different code snippets but have not found one which
>> allows me to create settings as well as mapping using one JSON document.
>>
>> Any help on this will be great!
>>
>> -Amit.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAAOGaQ%2BEUsstyy7qdNq%2BRmHzA-Rp9mYNYnOoQ8HESiGAvXwXVg%40mail.gmail.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAD7smRc-y_futLvWVuycgpxwSshJHawNWu8zrDkmZrfZ5sAnZw%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAOGaQ%2BA9evo_zsdF9BwcxdwXfEwfR1-6YYLRwEAkWurGVorwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana index does not contain a timestamp aka why does kibana search the kibana-int index?

2014-05-06 Thread Bharvi Dixit
Hi,
In the configuration setting of kibana which appears on top-right corner, 
you can provide name of specific index instead of _all under index tab. And 
under Timepicker tab, you can provide any date field of the logstash index 
instaead of default @timestamp  value. Note that date fields name should 
not contain @ symbol, it must be like tweetCreatedAt not @tweetCreatedAt.


Regards,
Bharvi Dixit

On Tuesday, 6 May 2014 19:03:04 UTC+5:30, Jorn Eilander wrote:
>
> LS,
>
> I've been using an ELK-stack for development purposes for a few weeks now, 
> and my logs were filled with errors. 
>
> The errors it seemed were traceable to the fact that Kibana was(/is) 
> querying all the shards on the node for the @timestamp fields, which isn't 
> present in the kibana-int index (where Kibana stores it reports).
>
> So when my users generate a report, Kibana searches in _all for the data, 
> which contains shards/indices not related to logstash... Is there any way I 
> can limit them to the indices/shards related to Logstash? I dislike errors 
> in my logging ;)
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/56462433-c6ab-4f76-955b-7b1925b7edcf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ANN: new elasticsearch discovery plugin - eskka

2014-05-06 Thread shikhar
Just released 0.1.1 

This version is working well in my manual testing. Automated testing is on
the roadmap...


On Mon, May 5, 2014 at 10:49 AM, shikhar  wrote:

> See README 
>
> I'd love to have feedback on this first release!
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHWG4DNiwCxbZakzogFfqFxYxabcQ_ysG2_OMd06%3D%2BfDqFEQdA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Hadoop] Writing directly to shards in EsOutputFormat and shard awareness

2014-05-06 Thread Ashwin Jayaprakash
Yeah - I figured as much.

Thanks.



On Tuesday, May 6, 2014 12:07:45 AM UTC-7, Costin Leau wrote:
>
> To reiterate on my previous email; by talking directly to primaries, 
> assuming a uniform distribution of documents, we have 
> at least an 1/x of the total documents sent (where X is the number of 
> shards) without proxying, while the rest will be 
> proxied 
> to the respective shards (why we don't compute the target shard directly 
> is part of the previous email). 
> Talking to a non-primary node would guarantee proxying for all of the 
> documents being sent. 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bc38ea3f-cbf3-48a3-9763-5e9ff295b0b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Problem of sync elasticsearch data after failover

2014-05-06 Thread victor fence
 > Having 3 nodes makes sure you can easily maintain a majority quorum.
> Once you get to larger sizes, it may/does make sense to have some data 
and master only nodes.

Thanks for your explanation, this is what I really need, thanks :)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f458b3b7-b07c-4a69-a972-3cb37977de98%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Multi DC cluster or separate cluster per DC?

2014-05-06 Thread Mark Walkom
Go the latter method and have two clusters, ES can be very sensitive to
network latency and you'll likely end up with more problems than it is
worth.
Given you already have the data source of truth being replicated, it's the
sanest option to just read that locally.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 6 May 2014 23:51, Sebastian Łaskawiec wrote:

> Hi!
>
> I'd like to ask for advice about deployment in multi DC scenario.
>
> Currently we operate on 2 Data Centers in active/standby mode.  like to
> opeIn case of ES we'd like to have different approach - we'drate in
> active-active mode (we want to optimize our resources especially for
> querying).
> Here are some details about target configuration:
>
>- 4 ES instances per DC. Full cluster will have 8 instances.
>- Up to 1 TB of data
>- Data pulled from database using JDBC River
>- Database is replicated asynchronously between DCs. Each DC will have
>its own database instance to pull data.
>- Average latency between DCs is about several miliseconds
>- We need to operate when passive DC is down
>
> We know that multi DC configuration might end with Split Brain issue. Here
> is how we want to prevent it:
>
>- Set node.master: true only in 4 nodes in active DC
>- Set node.master: false in passive DC
>- This way we'll be sure that new cluster will not be created in
>passive DC
>- Additionally we'd like to set discovery.zen.minimum_master_nodes: 3
>(to avoid Split Brain in active DC)
>
> Additionally there is problem with switchover (passive DC becomes active
> and active becomes passive). In our system it takes about 20 minutes and
> this is the maximum length of our maintenance window. We were thinking of
> shutting down whole ES cluster and switch node.master setting in
> configuration files (as far as I know this settings can not be changed via
> REST api). Then we'd need to start whole cluster.
>
> So my question is: is it better to have one big ES cluster operating on
> both DCs or should we change our approach and create 2 separate clusters
> (and rely on database replication)? I'd be grateful for advice.
>
> Regards
> Sebastian
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624b94kkDgY5ehdwSvPkA4TaZ9QPvds%3DZHsJ%2B5DFX1_e3xQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to get the json for a searchrequest?

2014-05-06 Thread android
Is there really no way to print a search request?

Because QueryBuilder does not know about indices, or the fields we want the
search request to return. And I'd like to have those fields in my JSon
representation.

Basically I'd like to create a JSon representation that I can use as a
source of a search request later on, like an editable template.



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-the-json-for-a-searchrequest-tp4050876p4055468.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1399425308474-4055468.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Problem of sync elasticsearch data after failover

2014-05-06 Thread victor fence


>
> > I think he tried to send more than one message as he did not see it in 
> the ML.
>
>  
You are right, and sorry for that :)
 
Can you give me a hint of my first question?

When the 2 servers dropped out  (not shutdown manually),
 I'll don't know which dropped out earlier or which did later, and this 
should be automaticly done, what can I do? or I am wrong?

Thank you

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ea8467c0-c541-439b-b0c8-1a233795f637%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: zen vs ec2 vs fd

2014-05-06 Thread Jason
To answer my own question #2:

discovery.zen.ping_timeout: timeout for master selection.

discovery.zen.fd.ping_timeout: timeout for fault detection.

So question #1 is now:

If discovery type is set to ec2, which is 
correct? discovery.ec2.fd.ping_timeout or discovery.zen.fd.ping_timeout?

Thanks,

Jason

On Tuesday, May 6, 2014 3:43:08 PM UTC-7, Jason wrote:
>
> Hi All,
>
> 1) If my discovery type is ec2, which of the following settings are 
> actually in effect? 
>
> discovery.zen.ping_timeout: 60s
> discovery.zen.fd.ping_timeout: 60s
> discovery.ec2.ping_timeout: 60s
> discovery.ec2.fd.ping_timeout: 60s
>
> 2) what's the difference between discovery.zen.ping_timeout and 
> discovery.zen.fd.ping_timeout? Is one just a typo that got propagated on 
> the web?
>
> Thanks in advance,
>
> Jason
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b058ce9f-73d1-42bf-ac1a-6cc7feaccecd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


zen vs ec2 vs fd

2014-05-06 Thread Jason
Hi All,

1) If my discovery type is ec2, which of the following settings are 
actually in effect? 

discovery.zen.ping_timeout: 60s
discovery.zen.fd.ping_timeout: 60s
discovery.ec2.ping_timeout: 60s
discovery.ec2.fd.ping_timeout: 60s

2) what's the difference between discovery.zen.ping_timeout and 
discovery.zen.fd.ping_timeout? Is one just a typo that got propagated on 
the web?

Thanks in advance,

Jason

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ecfd28da-ffd3-4ae8-83ee-7c457a9e53e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Snapshot & Restore Frequency

2014-05-06 Thread Andrew Selden
Alex,

Yes snapshots are online and yes they do store diffs, but only after the first 
snapshot which by definition has to copy everything.

As to the question of how frequent to run them, that really depends on your 
tolerance for loss. The added operational costs are reads from disk and bytes 
sent over the wire. You can tune this behavior with max_snapshot_bytes_per_sec 
[1].

Andrew

1. 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html#_snapshot

On May 6, 2014, at 3:25 PM, Alex Philipp  wrote:

> According to the docs, snapshot operations are online and only store diffs.  
> Is there any particular reason to not run them at a fairly high frequency?  
> E.g. every 15 minutes?
> 
> Alex
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/60c3365b-7dec-487b-be45-c174ef992329%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/D0F14488-BDF9-495F-8C9F-37BFA0A2AC87%40elasticsearch.com.
For more options, visit https://groups.google.com/d/optout.


Snapshot & Restore Frequency

2014-05-06 Thread Alex Philipp
According to the docs, snapshot operations are online and only store diffs. 
 Is there any particular reason to not run them at a fairly high frequency? 
 E.g. every 15 minutes?

Alex

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/60c3365b-7dec-487b-be45-c174ef992329%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: What OS memory does es use other than Java?

2014-05-06 Thread joergpra...@gmail.com
Yes, of course Elasticsearch is using off-heap memory. All the Lucene index
I/O is using direct buffers in native OS memory.

Errors in allocating direct buffers will result in Java errors. You mention
Linux memory errors but unfortunately you do not quote it, so I have to
guess.

You should have enabled memory mapped files by index store mmapfs (default
on RHEL) so all files that are read by ES are mapped into virtual address
space of the OS VM management.

And also bootstrap.mlockall = true, so you also need to set memlock to
unlimited in /etc/security/limits.conf, because RHEL/Centos memlockable
memory is limited to 25% of RAM by default. In that case, Java should throw
an IOException "Map failed".

Note, because of the memory page lock support of the host OS, you should
also check what kind of virtualization you have enabled for the guest, it
should be HW (full) virtualization, not paravirtualization.

If you still encounter issues from Linux OS errors it is most probably
because of VMware limitations, so you should disable the bootstrap.mlockall
setting.

As a side note, the recommended heap size is 50% of the RAM that is
available to the ES process. If you run a VM, you should assign at most 50%
of the configured guest OS memory to ES.

Jörg


On Tue, May 6, 2014 at 10:35 PM, Edward Sargisson  wrote:

> Hi all,
> We have a problem where our es nodes will fail with an out of memory error
> from Linux (note, not Java). Our es processes are configured with a fixed
> amount of heap (60% of total RAM - just as in in the elasticsearch chef
> cookbook).
>
> So, something is consuming all of the memory available to Linux.
>
> Is there any other memory that ES can use? Does it lock OS cache or buffer
> memory so that it can't be released? If it opens lots of files does it use
> up too much RAM? Is it doing off-heap allocation? (I'm pretty sure the
> answer is no to the last).
>
> We're struggling to find the exact memory resource being used up.
>
> For the record. this is ES 1.1.0 on CentOS 6.4 running in VMWare.
>
> Thanks!
> Edward
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ab6421e3-89a1-409f-b89b-f09ca5bc9551%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF%3Dw7iyBUfKxYQm45yG6Zh6a5Rg7SipDKNmdPA3MijYGw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Data truncated when including number following by '.'

2014-05-06 Thread Mac Jouz

Hi,

I'm using ELK (ES v1.1.1) and I'm facing a strange behaviour when trying to 
aggregate some kind of data including  a number following by a '.' (dot) 

I built a panel with kibana to show Top 10 Java method used in my 
application and I can see that the result provided by ES is truncated when 
there is one number following by a '.' (dot)

For instance the ES result is 
 [...]
  {
"term" : "connector.abczcm9stub",
"count" : 1387
  }, {
"term" : "cachedmapper.com.test.jca.abc.copabczcm9",
"count" : 1387
  },
 [...]

whereas the x_java_method which is expected in this case is 

 [...]
  {
"term" : 
"cachedmapper.com.test.jca.abc.copabczcm9.connector.abczcm9stub",
"count" : 1387
  },
 [...]

Of course the result is correct when there is no number followed by a '.'

 [...] 
 {
"term" : 
"cachedmapper.com.test.jca.abc.copabcxa0f.connector.abcxa0fstub",
"count" : 148
  }, 
  [...] 
 
*More details:*
=> ES full request:
 
curl -XGET 
'http://localhost/elasticsearch/logstash-2014.04.29/_search?pretty' -d '{
  "facets": {
"terms": {
  "terms": {
"field": "x_java_method",
"size": 10,
"order": "count",
"exclude": []
  },
  "facet_filter": {
"fquery": {
  "query": {
"filtered": {
  "query": {
"bool": {
  "should": [
{
  "query_string": {
"query": "type : \"appli_type\""
  }
}
  ]
}
  },
  "filter": {
"bool": {
  "must": [
{
  "range": {
"@timestamp": {
  "from": 139872240,
  "to": "2014-04-29T22:00:00.000Z"
}
  }
},
{
  "fquery": {
"query": {
  "query_string": {
"query": "x_java_method:*Stub"
  }
},
"_cache": true
  }
}
  ]
}
  }
}
  }
}
  }
}
  },
  "size": 0
}'  

=> ES Result :

{
  "took" : 179,
  "timed_out" : false,
  "_shards" : {
"total" : 10,
"successful" : 10,
"failed" : 0
  },
  "hits" : {
"total" : 8742790,
"max_score" : 0.0,
"hits" : [ ]
  },
  "facets" : {
"terms" : {
  "_type" : "terms",
  "missing" : 0,
  "total" : 24278,
  "other" : 1323,
  "terms" : [ {
"term" : 
"cachedmapper.com.test.jca.abc.copabczcnc.connector.abczcncstub",
"count" : 14199
  }, {
"term" : 
"cachedmapper.com.test.jca.abc.copabczcnb.connector.abczcnbstub",
"count" : 4817
  }, *{*
*"term" : "connector.abczcm9stub",*
*"count" : 1387*
*  }, {*
*"term" : "cachedmapper.com.test.jca.abc.copabczcm9",*
*"count" : 1387*
*  }*, {
"term" : 
"cachedmapper.com.test.jca.abc.copabczcna.connector.abczcnastub",
"count" : 342
  }, {
"term" : 
"cachedmapper.com.test.jca.def.defzcnj.connector.defzcnjstub",
"count" : 246
  },* {*
*"term" : "connector.ghica017stub",*
*"count" : 174*
*  }, {*
*"term" : "cachedmapper.com.test.jca.ghi.ghica017",*
*"count" : 174*
*  }*, {
"term" : 
"cachedmapper.com.test.jca.abc.copabcxa0f.connector.abcxa0fstub",
"count" : 148
  }, {
"term" : "connector.ghica014stub",
"count" : 81
  } ]
}
  }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/69410e42-2b0f-423a-b4f6-4869bea303a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


What OS memory does es use other than Java?

2014-05-06 Thread Edward Sargisson
Hi all,
We have a problem where our es nodes will fail with an out of memory error 
from Linux (note, not Java). Our es processes are configured with a fixed 
amount of heap (60% of total RAM - just as in in the elasticsearch chef 
cookbook).

So, something is consuming all of the memory available to Linux.

Is there any other memory that ES can use? Does it lock OS cache or buffer 
memory so that it can't be released? If it opens lots of files does it use 
up too much RAM? Is it doing off-heap allocation? (I'm pretty sure the 
answer is no to the last).

We're struggling to find the exact memory resource being used up.

For the record. this is ES 1.1.0 on CentOS 6.4 running in VMWare.

Thanks!
Edward

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ab6421e3-89a1-409f-b89b-f09ca5bc9551%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana Time Troubles

2014-05-06 Thread Tate Eskew
Hello,
Maybe someone can help me. My setup:
AWS Servers using rsyslog (UTC time) > Physical server in datacenter 
central syslog-ng server (CST). 
Logstash shipper is running on the central syslog-ng box (CST). It grabs 
the events coming in, mangles them, throws them into redis. Logstash 
indexer on another box grabs them out of redis, shoves them in 
elasticsearch.  

Everything works as expected for months now, the only problem I have is 
that the display in Kibana doesn't show the log events for 5 hours because 
of the Logstash shipper being CST (5 hours behind). Any idea on how to get 
it to display immediately? Logs display immediately if I send to the 
central log server from a server that is CST as well. Here is a sample from 
an AWS box (UTC) that is picked up by the central log server (CST)

Is there any way to get Kibana to show the events as they come in 
correctly?  We have lots of physical machines in our datacenters and they 
are all set to CST, but all of our AWS instances are set to UTC.  As of 
right now, we don't want to change the central syslog server's timezone to 
UTC since it still resides in one of our data centers. 

Any ideas? Is this something we should try to fix at the Logstash config or 
is this a display fix for Kibana?

Here is a sample from an AWS box (UTC) that is picked up by the central log 
server (CST) - Displays 5 hours later/incorrectly

{
  "_index": "logstash-2014.05.06",
  "_type": "syslog",
  "_id": "mZvpk-_9T4WgA2zxlsxogA",
  "_score": null,
  "_source": {
"@version": "1",
"@timestamp": "2014-05-05T20:01:26.000-05:00",
"type": "syslog",
"syslog_pri": "163",
"syslog_program": "ubuntu",
"received_at": "2014-05-05 20:01:27 UTC",
"syslog_severity_code": 3,
"syslog_facility_code": 20,
"syslog_facility": "local4",
"syslog_severity": "error",
"@source_host": "p-aws-emmaplatformsingle01",
"@message": "trustinme",
"@host": "p-aws-emmaplatformsingle01"
  },
  "sort": [
1399338086000
  ]
}

Here is a sample from a physical machine in one of our data centers (CST) that 
is picked up by the central logs server (CST) - Diplays instantly/correctly

{
  "_index": "logstash-2014.05.06",
  "_type": "syslog",
  "_id": "SjWn9aJWRGKeshylyp1j2Q",
  "_score": null,
  "_source": {
"@version": "1",
"@timestamp": "2014-05-06T14:01:52.000-05:00",
"type": "syslog",
"syslog_pri": "13",
"syslog_program": "teskew",
"received_at": "2014-05-06 19:01:53 UTC",
"syslog_severity_code": 5,
"syslog_facility_code": 1,
"syslog_facility": "user-level",
"syslog_severity": "notice",
"@source_host": "p-bna-apix01",
"@message": "trustinme",
"@host": "p-bna-apix01"
  },
  "sort": [
1399402912000
  ]
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fb88791b-1231-4db9--5afd5c18d7a2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Debugging elasticsearch

2014-05-06 Thread HungryHorace

mmm the tutorial says it does partial word autocomplete which is what i was 
looking for and you say it won't do. i'm a little confused here. 

i'll check out the suggest api - i hadn't seen that before.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e5e01242-c0bf-43a9-b177-ccd2e1ce5825%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: SQL Server JDBC on Linux box

2014-05-06 Thread anthony
OK,I am back with good news.   I got things to work on my Ubuntu laptop. 
 My problem appears to have been related to the original River install.  I 
was getting errors in the log that I did not notice and then I started 
moving JAR files around and made a mess of things.  I backed away, deleted 
my bad version of the JDBC plugin and reinstalled the correct version for 
the version of ElasticSearch I had on the box, then put the JAR files in 
the new river-jdbc directory   Viola!  She worked.  
Yahoo!
Thanks,
~aw

On Saturday, May 3, 2014 5:47:08 AM UTC-7, Jörg Prante wrote:
>
> From the single line I can read (*Failed to load class with value [jdbc]]*), 
> you did not install the JDBC river correctly, it does not start at all.
>
> If you can give a list of commands you used to install, and the files plus 
> file attributes in all the directories in Elasticsearch plugins folder, and 
> a complete log file of the node startup sequence, it would be a lot easier 
> to help.
>
> Jörg
>
>
> On Fri, May 2, 2014 at 10:00 PM,  >wrote:
>
>> Hello - That was a no-go unfortunately...same error after I copied to 
>> that path.  I am using same JSON I used to set up river on Windows and it 
>> worked, so I would think the river command is fine.  The error specifically 
>> references the jdbc driver.  I saw some postings about case sensitivity 
>> when referencing the the driver, but that is not an issue in my 
>> command...case is correct 
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/77c86853-ea7c-41bc-805b-30d47a247c1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Registering node event listeners

2014-05-06 Thread Ivan Brusic
Not possible I guess.


On Wed, Apr 30, 2014 at 9:17 AM, Ivan Brusic  wrote:

> NodesFaultDetection is what I am looking into using, but on the client
> side, not as a plugin. I use similar code to create an AnalysisService
> locally, but no luck when it comes to TransportService actions.
>
> --
> Ivan
>
>
> On Wed, Apr 30, 2014 at 8:56 AM, joergpra...@gmail.com <
> joergpra...@gmail.com> wrote:
>
>> Not sure what events you are after, but I guess you just want to push
>> notifications about node failures?
>>
>> You could write a plugin that simply registers a listener to Zen's
>> NodesFaultDetection.addListener() and trigger the action you want (send
>> email, whatever) in the onNodeFailure() method of the listener.
>>
>> Jörg
>>
>>
>>
>> On Wed, Apr 30, 2014 at 5:39 PM, Ivan Brusic  wrote:
>>
>>> I should add that I am still on version 0.90.2. Looking to finally
>>> transition to 1.1 relatively soon. Our search infrastructure has had 100%
>>> uptime in the past two years, but it comes at the expense of not upgrading
>>> often.
>>>
>>> --
>>> Ivan
>>>
>>>
>>> On Wed, Apr 30, 2014 at 6:56 AM, Ivan Brusic  wrote:
>>>
 Would the DiscoverService solve my initial problem or only get around
 constructing a DiscoveryNodesProvider? DiscoverService only uses
 the InitialStateDiscoveryListener, which doesn't publish interesting 
 events.

 I won't be near a computer in the next few days to test.

 --
 Ivan


 On Wed, Apr 30, 2014 at 4:40 AM, joergpra...@gmail.com <
 joergpra...@gmail.com> wrote:

> Have you looked at InternalNode.java?
>
> Form my understanding you could try to implement your own
> DiscoveryModule with DiscoveryService and start it like this
>
> DiscoveryService discoService =
> injector.getInstance(DiscoveryService.class).start();
>
> Jörg
>
>
>
> On Wed, Apr 30, 2014 at 12:17 AM, Ivan Brusic  wrote:
>
>> I am looking to transition a piece of my search infrastructure from
>> polling the cluster's health status to hopefully receiving notifications
>> whenever an event occurs. Using the TransportService, I registered 
>> various
>> relevant listeners, but none of them are triggered.
>>
>> Here is the gist of the code:
>>
>> https://gist.github.com/brusic/2dcced28e0ed753b6632
>>
>> Most of it I stole^H^H^H^H^Hborrowed from ZenDiscovery. I am assuming
>> something is not quite right with the TransportService. I tried using 
>> both
>> a node client and a master-less/data-less client. I also suspect that
>> the DiscoveryNodesProvider might not have been initialized correctly, 
>> but I
>> am primarily after the events from NodesFaultDetection, which does not 
>> use
>> the DiscoveryNodesProvider.
>>
>> I know I am missing something obvious, but I cannot quite spot it. Is
>> there perhaps a different route using the TransportClient?
>>
>> Cheers,
>>
>> Ivan
>>
>>  --
>> You received this message because you are subscribed to the Google
>> Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC5twFLr%2By_oqkV3_SjS9T_kikG9Z%2BBi6DJ_jOydHYBCA%40mail.gmail.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEVGCvFFaeJmxba-UZEuKS7EK5FakqBbSgy4qUGuywtYg%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>


>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>>  To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBXfkD6B%3DK4fRRtdUNa3QLEj2a10J2xexZ2uuq0BR4y%3Dw%40mail.gmail.com
>>> .
>>>
>>> For

Re: BackUp and restore

2014-05-06 Thread Andrew Selden
Hi,

The recommend way to do this is to use ES's built-in snapshot/restore feature 
[1]. You would probably want to store your backups in S3 rather than EBS.

Also note that EBS does not always provide an acceptable level of performance 
for low-latency queries. If you use EBS you most likely will want to use 
provisioned IOPS. 

Andrew

1. 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html#_snapshot


On May 6, 2014, at 4:50 AM, Ankur Goel  wrote:

> Hi All,
> 
> We have been using an elastic search cluster with 3 nodes , running on aws 
> machines , we have been using EBS for work and data directory of 
> elasticsearch . I was experimenting with backup using EBS snapshots , here is 
> what I did
> 
> 1.) created a snapshot of one ebs volume (say, alpha)
> 2.) deleted the index 
> 3.) shutdown the cluster
> 4.) unmounted the ebs volume on alpha (say /mnt/data )
> 5.) created a new volume from snapshot and mounted on alpha in the same 
> location (/mnt/data)
> 6.) restarted elastic search on one node only 
> cluster went to red state , all shards in unassigned state
> 7.) restarted another node with blank data (/mnt/data ) directory
> cluster went to yellow state , all shards STILL in 
> UNASSIGNED STATE
> 8.) manually allocated shards to node
>  recovered no documents , few kbs of data got restored :(
> 
> can anyone please help me debug this , what am I doing wrong , is there a 
> better way to do this ??
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/6381f2e3-706d-4bc8-86f5-283396e37560%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1E2194BC-5D83-4117-8A26-14676D4E2A38%40elasticsearch.com.
For more options, visit https://groups.google.com/d/optout.


Re: Node client cannot connect to cluster

2014-05-06 Thread David Chia
This was a lifesaver. Thanks for pointing out that both UDP and TCP needed 
to be open.

On Saturday, October 19, 2013 2:32:12 PM UTC-7, Jörg Prante wrote:
>
> Also note that telnet is not enough to check firewall issues, you must 
> open both UDP and TCP protocols on the hosts/ports. Apparently your host on 
> 10.0.0.0/8 class A subnet can't reach hosts on 172.16.0.0/16 class B 
> subnet.
>
> Jörg
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/195ecbf3-0785-40ac-ace3-c9f98d218fef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Debugging elasticsearch

2014-05-06 Thread Michael Lussier
Assuming the response for the settings and mappings for your cluster are 
the same as the requests, and you're expecting "name" to match on "arango".
The reason "arango" isn't partially matching as you're expecting because 
"arango" doesn't match any documents in _all. 

In the tutorial mentioned, queries are being used to highlight results as 
you search. As the example mentions in the tutorial, a query match for 
"disn 123 2013" will highlight matching results in the search box.
But using the same tutorial searching for "disnp 123 2013" or "disnp" will 
give no results. This tutorial works really well when you're looking to 
have a search only find exactly what you search for.

If you're looking to allow for spell checking/errors in your auto-complete, 
I suggest looking into the 
suggestapi.
 I hope that makes sense, let me know if I can offer my explanation.
 

On Tuesday, May 6, 2014 10:50:03 AM UTC-5, HungryHorace wrote:
>
> Hi Michael. 
>
> I'm on windows so I'm using sense rather than curl. here are my requests
>
> PUT /travelshopoffers/_settings
> {
>   "analysis": {
>  "filter": {
> "nGram_filter": {
>"type": "nGram",
>"min_gram": 2,
>"max_gram": 20,
>"token_chars": [
>   "letter",
>   "digit",
>   "punctuation",
>   "symbol"
>]
> }
>  },
>  "analyzer": {
> "nGram_analyzer": {
>"type": "custom",
>"tokenizer": "whitespace",
>"filter": [
>   "lowercase",
>   "asciifolding",
>   "nGram_filter"
>]
> },
> "whitespace_analyzer": {
>"type": "custom",
>"tokenizer": "whitespace",
>"filter": [
>   "lowercase",
>   "asciifolding"
>]
> }
>  }
>   }
> }
>
> PUT /travelshopoffers/geo/_mapping
>   {
> "mappings": {
> "geo": {
>  "_all": {
> "index_analyzer": "nGram_analyzer",
> "search_analyzer": "whitespace_analyzer"
>  },
>  "properties": {
> "name": {
>"type": "string",
>"include_in_all": true
> },
> "lat": {
>"type": "string",
>"index": "no"
> },
> "lng": {
>"type": "string",
>"index": "no"
> },
> "countryCode": {
>"type": "string",
>"index": "not_analyzed",
>"include_in_all": false
> },
> "countryName": {
>"type": "string",
>"index": "not_analyzed",
>"include_in_all": false
> }
>  }
>   }
>}
> }
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f9335fdf-b46c-4467-8c59-7e93c639ddde%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch-php and function_score

2014-05-06 Thread chris
hi !

the query below is working but when i use the php official client but i 
still have the exception : *malformed query, expected a START_OBJECT*


"query": {
  "function_score": {
 "query": {
"terms": {
   "categoryName": [
  "toto",
  "tutu"
   ],
   "minimum_match": 1
}
 },
 "functions": [
{
   "filter": {
  "terms": {
 "tags": [
"truc",
"bidule"
 ]
  }
   },
   "boost_factor": 2
}
 ]
  }
   }

in my script the query is divided 
both $category and $boost are arrays of values

// function_score --> init main query  
$functionScoreQuery = array( 'query' => array(
'terms' => array(
'categoryName' => $category
 )
));  

// function_score functions --> init boost 
$functionScoreFunctions = array( 'functions' => array(
'filter' => array( 
'terms' => array(
'tags' => $boost
)
),
'boost_factor' => 2
));   
 
$defaultSubQuery['function_score'] =  $functionScoreQuery; 
if (($boost)) {
$defaultSubQuery['function_score'] += 
 $functionScoreFunctions;  
}   

a print_r() of the request show :

Array
(
[index] => myIndex
[body] => Array
(
[query] => Array
(
[function_score] => Array
(
[query] => Array
(
[terms] => Array
(
[categoryName] => Array
(
[0] => toto
[1] => tutu
)

)

)

[functions] => Array
(
[filter] => Array
(
[terms] => Array
(
[tags] => Array
(
[0] => truc
[1] => bidule
)

)

)

[boost_factor] => 2
)

)

)

[from] => 0
[size] => 10
)

--> function_score: malformed query, expected a START_OBJECT while parsing 
functions but got a FIELD_NAME];

I know the *function_score* as a special syntax combining arrays and 
objects 
(http://www.elasticsearch.org/guide/en/elasticsearch/client/php-api/current/_search_operations.html
 
)

but i didn' found the correct syntax using the official php client 
...examples are quite light in this case so any help will be appreciated :-)

regards,
chris
 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/68c9c8b4-96f4-46a0-bf45-6f29693991a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Debugging elasticsearch

2014-05-06 Thread HungryHorace
Hi Michael. 

I'm on windows so I'm using sense rather than curl. here are my requests

PUT /travelshopoffers/_settings
{
  "analysis": {
 "filter": {
"nGram_filter": {
   "type": "nGram",
   "min_gram": 2,
   "max_gram": 20,
   "token_chars": [
  "letter",
  "digit",
  "punctuation",
  "symbol"
   ]
}
 },
 "analyzer": {
"nGram_analyzer": {
   "type": "custom",
   "tokenizer": "whitespace",
   "filter": [
  "lowercase",
  "asciifolding",
  "nGram_filter"
   ]
},
"whitespace_analyzer": {
   "type": "custom",
   "tokenizer": "whitespace",
   "filter": [
  "lowercase",
  "asciifolding"
   ]
}
 }
  }
}

PUT /travelshopoffers/geo/_mapping
  {
"mappings": {
"geo": {
 "_all": {
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
 },
 "properties": {
"name": {
   "type": "string",
   "include_in_all": true
},
"lat": {
   "type": "string",
   "index": "no"
},
"lng": {
   "type": "string",
   "index": "no"
},
"countryCode": {
   "type": "string",
   "index": "not_analyzed",
   "include_in_all": false
},
"countryName": {
   "type": "string",
   "index": "not_analyzed",
   "include_in_all": false
}
 }
  }
   }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ff9e6ec7-f77c-4c24-ba06-9b2eef9de9ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


filtering avec cardinality

2014-05-06 Thread Christophe Journel
Hello, 
I am using ES 1.1.0 and i don't know how to filter result after the 
cardinality aggregration.

is there a way to use something like 'min_value' to filter cardinality 
aggregation  ?

Here is an example 

 "aggs": {
"agg_login": {
"terms": {
"field": "login",
"size": 1000,
"min_doc_count": 5

},
"aggregations": {
"agg_distinct_ip":{
"cardinality": {
"field" : "ip_client",
"min_value": 5 
},
},
}
}


Note: the "min_value" function does not work, but that's what i need :)

and i would like the result to be :  login with count distint ip address 
greater than 10

Thanks, 

Christophe

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7a3ab04e-b591-439d-bcf6-48a240f55469%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: index binary files

2014-05-06 Thread David Pilato
Hard to tell without details of what you did so far and what did not work.

May be you could start describing all the steps from the beginning (provide 
versions for all component you are using as well)?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 6 mai 2014 à 17:30:28, anass benjelloun (anass@gmail.com) a écrit:

Hello again,

thanks for your answer.
But my probleme is how to specify the folder which have many documents to index 
all of these document.
and i tested already indexing pdf file with giving base64 content and that 
didn't work.

Best regards,
Anass BENJELLOUN
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cf373e11-f548-4b62-9692-e6fe79888cc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53690096.b03e0c6.5c36%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: index binary files

2014-05-06 Thread anass benjelloun
Hello again,

thanks for your answer.
But my probleme is how to specify the folder which have many documents to 
index all of these document.
and i tested already indexing pdf file with giving base64 content and that 
didn't work.

Best regards,
Anass BENJELLOUN

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cf373e11-f548-4b62-9692-e6fe79888cc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


integration of elasticsearch on tomcat6

2014-05-06 Thread anass benjelloun
Hello,

I'm searching for a solution to integrate elasticsearch on tomcat6.

thanks for help.

regards.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fdf22e37-20fe-470a-82d9-b19da062f070%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Debugging elasticsearch

2014-05-06 Thread Michael Lussier
Can you provide your index settings and type mapping? 

On Monday, May 5, 2014 3:48:52 PM UTC-5, HungryHorace wrote:
>
> I've been following a tuturial here:
>
> http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
>
> but when I query for results I only get matches if i use the entire word. 
> Partial word searches always give zero results. How do debug this ? 
>
> A query for 'aragon' produces matches but 'arango' produces zero 
> matches...? I'm using 'sense' to submit my queries
>
> POST /tso/geo/_search
> {
>"size": 10,
>"query": {
>   "match": {
>  "_all": {
> "query": "arago",
> "operator": "and"
>  }
>   }
>}
> }
>
> result. 
> {
>"took": 1,
>"timed_out": false,
>"_shards": {
>   "total": 5,
>   "successful": 5,
>   "failed": 0
>},
>"hits": {
>   "total": 0,
>   "max_score": null,
>   "hits": []
>}
> }
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5ec140c4-7bbb-4783-95be-49040cc5eb7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Problems with span_not query

2014-05-06 Thread Matthew Brown
Having some problems with queries containing span_not. I've simplified the 
query down to a test example however the query returns additional documents 
I don't think you be returned.

https://gist.github.com/m-brown/59b9b5ad6f68a5d12d0a

In short I want to find the documents that contain 'foo' but not 'bar' from:
foo
foo bar
bar foo
foo foo bar
foo bar foo
bar foo foo

The below query returns two docs ('foo' and 'bar foo foo') rather than the 
one I was expecting:
{
  "query": {
"span_not": {
  "include": {
"span_term": {
  "field1": "foo"
}
  },
  "exclude": {
"span_near": {
  "in_order": false,
  "clauses": [
{
  "span_term": {
"field1": "bar"
  }
},
{
  "span_term": {
"field1": "foo"
  }
}
  ],
  "slop": 1000
}
  }
}
  }
}

Why does 'bar foo foo' match the query, and given that it does, why don't 
any of the others given in_order is false?

Tested on elasticsearch 1.0.1 and 1.1.1 on Ububtu 12.04.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/711fe54f-436c-453e-8a9d-59ff73c57c67%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Multi DC cluster or separate cluster per DC?

2014-05-06 Thread Sebastian Łaskawiec
Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode.  like to 
opeIn case of ES we'd like to have different approach - we'drate in 
active-active mode (we want to optimize our resources especially for 
querying). 
Here are some details about target configuration:

   - 4 ES instances per DC. Full cluster will have 8 instances.
   - Up to 1 TB of data
   - Data pulled from database using JDBC River
   - Database is replicated asynchronously between DCs. Each DC will have 
   its own database instance to pull data.
   - Average latency between DCs is about several miliseconds
   - We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue. Here 
is how we want to prevent it:

   - Set node.master: true only in 4 nodes in active DC
   - Set node.master: false in passive DC
   - This way we'll be sure that new cluster will not be created in passive 
   DC
   - Additionally we'd like to set discovery.zen.minimum_master_nodes: 3 
   (to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes active 
and active becomes passive). In our system it takes about 20 minutes and 
this is the maximum length of our maintenance window. We were thinking of 
shutting down whole ES cluster and switch node.master setting in 
configuration files (as far as I know this settings can not be changed via 
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on 
both DCs or should we change our approach and create 2 separate clusters 
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Percolator on nested object shard failure

2014-05-06 Thread razafinr
I am using elasticsearch 1.1.

Normally in this version, percolator on nested documents should work. 

Although, i am trying to do this but I get the following error:

failures: [

{
index: test
shard: 4
reason: BroadcastShardOperationFailedException[[test][4] ];
nested: PercolateException[failed to percolate]; nested:
ElasticsearchIllegalArgumentException[Nothing to percolate]; 
}

]

I have the following percolator (sorry elasticsearch head removed me all the
quotes):

{

_index: test
_type: .percolator
_id: 27
_version: 1
_score: 1
_source: {
query: {
filtered: {
query: {
match_all: { }
}
filter: {
nested: {
filter: {
term: {
city: london
}
}
path: location
}
}
}
}
}

}

And while trying to percolate this document I am getting the error:

{
  ...
  "location": {
"date": "2014-05-05T15:07:58",
"namedplaces": {
  "city": "london"
}
  }
}

Any idea why it doesn't work ? 

**EDIT :** 

In elasticsearch log I got more precision about the error:

[2014-05-06 13:33:48,972][DEBUG][action.percolate ] [Tomazooma]
[test][2], node[H42BBxajRs2w2NmllMnp7g], [P], s[STARTED]: Failed to execute
[org.elasticsearch.action.percolate.PercolateReque
st@7399452e]
org.elasticsearch.percolator.PercolateException: failed to percolate
at
org.elasticsearch.action.percolate.TransportPercolateAction.shardOperation(TransportPercolateAction.java:198)
at
org.elasticsearch.action.percolate.TransportPercolateAction.shardOperation(TransportPercolateAction.java:55)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$2.run(TransportBroadcastOperationAction.java:226)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException:
Nothing to percolate
at
org.elasticsearch.percolator.PercolatorService.percolate(PercolatorService.java:187)
at
org.elasticsearch.action.percolate.TransportPercolateAction.shardOperation(TransportPercolateAction.java:194)
... 5 more



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Percolator-on-nested-object-shard-failure-tp4055428.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1399377263852-4055428.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: cluster reroute and potential data loss

2014-05-06 Thread Ankur Goel
Hi ,

I am also facing the same issue , did you get it resolved ??
I am facing it while doing a ebs snapshot based recovery .

On Tuesday, 4 February 2014 03:46:04 UTC+5:30, Mark Conlin wrote:
>
>
> So during a cluster restart sometimes we get nodes that have unallocated 
> shards, both the primary and replica will be unallocated. 
> They stay stuck in this state, leaving the cluster red. 
>
> If I force allocation, with allow_primary="true", I get a new blank shard, 
> all docs lost. 
> If I force allocation, with allow_primary="false", I get an error:
>
> {
>"error": 
> "RemoteTransportException[[yournodename][inet[/10.1.1.1:9300]][cluster/reroute]];
>  
> nested: ElasticSearchIllegalArgumentException[[allocate] trying to allocate 
> a primary shard [yourindexname][4]], which is disabled]; ",
>"status": 400
> }
>
> Once the cluster gets to this state, am I just out of luck on recovering 
> the data in these shards?
>
> Mark
>
>
>
>
> On Mon, Feb 3, 2014 at 4:57 PM, Nikolas Everett 
> > wrote:
>
>> If all replicas of a particular shard are unallocated and you 
>> allow_primary allocate one then it'll allocate empty.  If a node that had 
>> some data for that shard comes back it won't be able to use that data 
>> because the shard has been allocated empty.
>>
>>
>> On Mon, Feb 3, 2014 at 4:42 PM, Mark Conlin 
>> > wrote:
>>
>>>
>>> I was reading some ES 
>>> docoand
>>>  stumbled upon this part of the Cluster Reroute API:
>>>
>>> allocate:
>>>  "*Allocate an unassigned shard to a node.  It also accepts the 
>>> "allow primary" flag to explicitly specify that it is allowed to explicitly 
>>> allocate a primary shard (might result in data loss)."*
>>>
>>>
>>> Why might this result in data loss?
>>>
>>> If I use:
>>>
>>> POST /_cluster/reroute 
>>> {
>>> "commands" : [ {
>>> "cancel" :
>>> {
>>>   "index" : "myindex", "shard" : 4, "node": "somenode", 
>>> "allow_primary":"true"
>>> }
>>> }
>>> ]
>>> }
>>>
>>>
>>> To get a node that has unallocated shards back to green, how will I know 
>>> if data loss has occured?
>>> How/why is the data being lost?
>>>
>>> Thanks, 
>>> Mark
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com .
>>>
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/0829b11c-d18a-4f6e-9cf4-67a94fd55daa%40googlegroups.com
>>> .
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/jeaefaiC6d8/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1e_y78fPt_N%3DUcTc4Grad_L4fVMLzT%2By3dQ232dsEfEQ%40mail.gmail.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aa1e3bfc-bf61-4f92-b12d-9b8ab414834e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana index does not contain a timestamp aka why does kibana search the kibana-int index?

2014-05-06 Thread Jorn Eilander
LS,

I've been using an ELK-stack for development purposes for a few weeks now, 
and my logs were filled with errors. 

The errors it seemed were traceable to the fact that Kibana was(/is) 
querying all the shards on the node for the @timestamp fields, which isn't 
present in the kibana-int index (where Kibana stores it reports).

So when my users generate a report, Kibana searches in _all for the data, 
which contains shards/indices not related to logstash... Is there any way I 
can limit them to the indices/shards related to Logstash? I dislike errors 
in my logging ;)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/17438280-2a12-4aac-aa39-4349d4f8d996%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: /etc/init.d/elasticsearch fails at system boot

2014-05-06 Thread Nikolas Everett
I had trouble with this a while ago and in my case it turned out to be that
I didn't have enough ram to allocate and the JDK was crashing.  That
probably isn't what is up with yours but you can follow along with the
issue starting here:
https://github.com/elasticsearch/elasticsearch/issues/5234

These kinds of problems irk me because I don't really have a good way to
solve them.  Maybe pipe the stdout/stderr of the Elasticsearch process in
the init script to tmp temporarily file then try to start it.  The JDK has
a habit of printing errors to stderr because, when it fails, it doesn't
have anywhere else to send them.

Path problems, maybe?

Nik


On Tue, May 6, 2014 at 9:12 AM, Zacharie Elcor  wrote:

> Hi,
>
> I installed elasticsearch 1.1.1 from rpm package on redhat 6.2
>
> Init script /etc/init.d/elasticsearch is installed and run at server boot
> but *no java process is started or *it dies silently. No log in
> /var/log/elasticsearch/* but the lock file is touched
> (/var/lock/subsys/elasticsearch)
>
> If I run (as root) "/etc/init.d/elasticsearch start", the node is started.
>
> I added a logger statement to /usr/share/elasticsearch/bin/elasticsearch
> to see what is actually run. Here is what is written to /var/log/message :
>
> exec "/usr/bin/java"  -Xms4g -Xmx4g -Xss256k -Djava.awt.headless=true -XX
> :+UseParNewGC -XX:+UseConcMarkSweepGC \
>  -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+HeapDumpOnOutOfMemoryError \
>  -Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -
> Des.path.home="/usr/share/elasticsearch" \
>  -cp
> ":/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*"
> -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=
> /var/log/elasticsearch \
>  -Des.default.path.data=/var/product/elasticsearch -Des.default.path.work=
> /tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch \
>  org.elasticsearch.bootstrap.Elasticsearch
>
> If I run manually this command line after switching to elasticsearch user,
> the node is started.
>
> In summary, everything works normally except that elasticsearch won't
> start automatically after a server reboot.
>
> Any clue ? I'm stuck
>
> --
> Zac
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1b462c05-d03a-4283-b8a4-b84f37f01cc2%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2YdtC3tJtvhhuizRLSgJE8qTh8aUW%2Bz9_a-daEWmd%3DkQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch out of memory issue

2014-05-06 Thread Adrien Grand
Hi,

Multi-level term aggregations do not indeed perform very efficiently right
now which is something that will be significantly improved in Elasticsearch
1.2.


On Tue, May 6, 2014 at 3:05 PM, Rahul Bahirat wrote:

> Hi
> On elasticsearch(1.1.1)  we did the aggregation queries. We notice that if
> index is empty then ES node go out of memory
>
>
> {
>   "aggregations" : {
> "dataFilter" : {
>   "filter" : {
> "and" : {
>   "filters" : [ {
> "range" : {
>   "startTime" : {
> "from" : "now-1d",
> "to" : "now",
> "include_lower" : true,
> "include_upper" : true
>   }
> }
>   }, {
> "not" : {
>   "filter" : {
> "term" : {
>   "type" : "xyz"
> }
>   }
> }
>   } ]
> }
>   },
>   "aggregations" : {
> "startTimeMinOfDayTerm" : {
>   "terms" : {
> "field" : "startTimeMinOfDay",
> "size" : 0
>   },
>   "aggregations" : {
> "nameTerm" : {
>   "terms" : {
> "field" : "name",
> "size" : 0
>   },
>   "aggregations" : {
> "sessionTypeTerm" : {
>   "terms" : {
> "field" : "sessionType",
> "size" : 0
>   },
>   "aggregations" : {
> "typeTerm" : {
>   "terms" : {
> "field" : "type",
> "size" : 0
>   },
>   "aggregations" : {
> "sizeStats" : {
>   "stats" : {
> "field" : "size"
>   }
> },
> "totalTimeStats" : {
>   "stats" : {
> "field" : "totalTimeTaken"
>   }
> },
> "Stats" : {
>   "stats" : {
> "field" : "hits"
>   }
> }
>   }
> }
>   }
> }
>   }
> }
>   }
> }
>   }
> }
>   }
> }
>
>
> stack trace  of Elastic-search
>
> [2014-05-06 12:10:00,947][DEBUG][action.search.type   ] [Silver Sable]
> [oss_3][1], node[2j3OSX1gQSCudUYq8_uTOw], [P], s[STARTED]: Failed to
> execute [org.elasticsearch.action.search.SearchRequest@1ab39e8] lastShard
> [true]
> org.elasticsearch.ElasticsearchException: Java heap space
> at
> org.elasticsearch.ExceptionsHelper.convertToRuntime(ExceptionsHelper.java:37)
> at
> org.elasticsearch.search.SearchService.createContext(SearchService.java:531)
> at
> org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480)
> at
> org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
> at
> org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)
> at
> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
> at
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
> at
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
> at
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at
> org.elasticsearch.common.util.BigArrays.newLongArray(BigArrays.java:446)
> at
> org.elasticsearch.search.aggregations.bucket.BucketsAggregator.(BucketsAggregator.java:46)
> at
> org.elasticsearch.search.aggregations.bucket.terms.StringTermsAggregator.(StringTermsAggregator.java:65)
> at
> org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.create(TermsAggregatorFactory.java:139)
> at
> org.elasticsearch.search.aggregations.support.ValueSourceAggregatorFactory.create(ValueSourceAggregatorFactory.java:58)
> at
> org.elasticsearch.search.aggregations.AggregatorFactories.createAndRegisterContextAware(AggregatorFactories.java:53)
> at
> org.elasticsearch

/etc/init.d/elasticsearch fails at system boot

2014-05-06 Thread Zacharie Elcor
Hi,

I installed elasticsearch 1.1.1 from rpm package on redhat 6.2

Init script /etc/init.d/elasticsearch is installed and run at server boot 
but *no java process is started or *it dies silently. No log in 
/var/log/elasticsearch/* but the lock file is touched 
(/var/lock/subsys/elasticsearch)

If I run (as root) "/etc/init.d/elasticsearch start", the node is started.

I added a logger statement to /usr/share/elasticsearch/bin/elasticsearch to 
see what is actually run. Here is what is written to /var/log/message :

exec "/usr/bin/java"  -Xms4g -Xmx4g -Xss256k -Djava.awt.headless=true -XX:+
UseParNewGC -XX:+UseConcMarkSweepGC \
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -
XX:+HeapDumpOnOutOfMemoryError \
 -Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.
path.home="/usr/share/elasticsearch" \
 -cp 
":/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*"
 
-Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/
log/elasticsearch \
 -Des.default.path.data=/var/product/elasticsearch -Des.default.path.work=
/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch \
 org.elasticsearch.bootstrap.Elasticsearch

If I run manually this command line after switching to elasticsearch user, 
the node is started.

In summary, everything works normally except that elasticsearch won't start 
automatically after a server reboot.

Any clue ? I'm stuck

--
Zac

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1b462c05-d03a-4283-b8a4-b84f37f01cc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch out of memory issue

2014-05-06 Thread Rahul Bahirat
Hi
On elasticsearch(1.1.1)  we did the aggregation queries. We notice that if 
index is empty then ES node go out of memory 


{
  "aggregations" : {
"dataFilter" : {
  "filter" : {
"and" : {
  "filters" : [ {
"range" : {
  "startTime" : {
"from" : "now-1d",
"to" : "now",
"include_lower" : true,
"include_upper" : true
  }
}
  }, {
"not" : {
  "filter" : {
"term" : {
  "type" : "xyz"
}
  }
}
  } ]
}
  },
  "aggregations" : {
"startTimeMinOfDayTerm" : {
  "terms" : {
"field" : "startTimeMinOfDay",
"size" : 0
  },
  "aggregations" : {
"nameTerm" : {
  "terms" : {
"field" : "name",
"size" : 0
  },
  "aggregations" : {
"sessionTypeTerm" : {
  "terms" : {
"field" : "sessionType",
"size" : 0
  },
  "aggregations" : {
"typeTerm" : {
  "terms" : {
"field" : "type",
"size" : 0
  },
  "aggregations" : {
"sizeStats" : {
  "stats" : {
"field" : "size"
  }
},
"totalTimeStats" : {
  "stats" : {
"field" : "totalTimeTaken"
  }
},
"Stats" : {
  "stats" : {
"field" : "hits"
  }
}
  }
}
  }
}
  }
}
  }
}
  }
}
  }
}


stack trace  of Elastic-search 

[2014-05-06 12:10:00,947][DEBUG][action.search.type   ] [Silver Sable] 
[oss_3][1], node[2j3OSX1gQSCudUYq8_uTOw], [P], s[STARTED]: Failed to 
execute [org.elasticsearch.action.search.SearchRequest@1ab39e8] lastShard 
[true]
org.elasticsearch.ElasticsearchException: Java heap space
at 
org.elasticsearch.ExceptionsHelper.convertToRuntime(ExceptionsHelper.java:37)
at 
org.elasticsearch.search.SearchService.createContext(SearchService.java:531)
at 
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480)
at 
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
at 
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)
at 
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.OutOfMemoryError: Java heap space
at 
org.elasticsearch.common.util.BigArrays.newLongArray(BigArrays.java:446)
at 
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.(BucketsAggregator.java:46)
at 
org.elasticsearch.search.aggregations.bucket.terms.StringTermsAggregator.(StringTermsAggregator.java:65)
at 
org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.create(TermsAggregatorFactory.java:139)
at 
org.elasticsearch.search.aggregations.support.ValueSourceAggregatorFactory.create(ValueSourceAggregatorFactory.java:58)
at 
org.elasticsearch.search.aggregations.AggregatorFactories.createAndRegisterContextAware(AggregatorFactories.java:53)
at 
org.elasticsearch.search.aggregations.AggregatorFactories.access$100(AggregatorFactories.java:38)
at 
org.elasticsearch.search.aggregations.AggregatorFactories$1.(AggregatorFactories.java:86)
at 
org.elasticsearch.search.aggregations.AggregatorFactories.createSubAggregators(AggregatorFactories.java:74)
at 
org.elasticsearch.search.aggregations.Aggregator.(Aggregator.java:86)
at 
org.elasticsearch.search.aggregations.bucket.Buck

Insert Record from AJAX - Not working

2014-05-06 Thread Raghavendar T S
Hi 
I am able to search in elastic search with AJAX. But I am not able insert 
record with AJAX. Can someone help me..

var attr={
"username":"username",
"password":"password"
};

$.ajax( {
url: 'http://localhost:9200/'+'objects'+'/'+'saved_queries1/1/',
type: 'POST',
crossDomain: true,
dataType: 'json',
data: JSON.stringify(attr),
success: function(response) { 
console.log("SUCCESS"); 
},
error: function(jqXHR, textStatus, errorThrown) {
console.log("ERROR"); 
}
}); 

}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/67c75595-414a-475b-bdd5-be40af321fce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unresponsive cluster after too large of a query (OutOfMemoryError: Java heap space)?

2014-05-06 Thread joergpra...@gmail.com
ES has a lot of failsafe mechanisms against "OutOfMemoryError" built in:

- thread pools are strict, they do not grow endlessly
- field cache usage is limited
- a field circuit breaker helps to terminate queries early before too much
memory is consumed
- closing unused indices frees heap resources that are no longer required
- balancing shards over nodes for equalizing resource usage over the nodes
- catching "Throwable" in several critical modules to allow spontaneous
recovery from temporary JVM OOMs (e.g. if GC is too slow)

Nevertheless you can override defaults and get into the "red area" where an
ES node is no longer able to react properly over the API, also because of

- misconfigurations
- "bad behaving" queries which exploit CPU usage or exceed available heap
in unpredictable ways
- unexpected, huge query loads, large result sets
- sudden peaks of resource usage, e.g. while merging large segments, or
bulk indexing
- distorted document/term distribution over shards that knock out equal
shard balancing
- etc.

Unresponsive nodes are taken out from the cluster after a few seconds, so
this is not really a problem, unless you have no replica, or the cluster
can't keep up with recovery from such events.

There is no known mechanism to protect you automatically from crossing the
line to the "red area" when a JVM can not recover from OOM and gets
unresponsive. This is not specific to ES but to all JVM applications.

Best practice is "know your data, know your nodes". Exercise your ES
cluster before putting real data on it to get an idea of the maximum
capacity of a node or the whole cluster and the best configuration options,
and put a proxy before ES to allow only "well behaving" actions.

Jörg



On Tue, May 6, 2014 at 2:06 AM, Nate Fox  wrote:

> Is there any way to prevent ES from blowing up just by selecting too much
> data? This is my biggest concern.
> Is it because the bootstrap.mlockall is on, so we give ES/JVM a specified
> amount of memory and thats all that node will receive? If we turned that
> off and had gobs more swap available for ES, would it not blow up, but just
> be real slow?
>
>
>
>
> On Mon, May 5, 2014 at 4:12 PM, Mark Walkom wrote:
>
>> Then you need more nodes, more heap on existing nodes or less data.
>> You've reached the limit of what your current cluster can handle, that is
>> why this is happening.
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 6 May 2014 09:11, Nate Fox  wrote:
>>
>>>  I have 11 nodes. 3 are dedicated masters and the other 8 are data
>>> nodes.
>>> On May 5, 2014 4:03 PM, "joergpra...@gmail.com" 
>>> wrote:
>>>
 You have only two nodes it seems. Adding nodes may help.

 Beside data nodes that do the heavy work, set up 3 master eligible
 nodes (data-less nodes, with reasonable smaller heap size for cluster state
 and mappings). Set the other data nodes to non-eligible for becoming 
 master.

 Jörg


 On Mon, May 5, 2014 at 9:34 PM, Nate Fox  wrote:

> We're using ES 1.1.0 for central logging storage/searching. When we
> use Kibana and search a month's worth of data, our cluster becomes
> unresponsive. By unresponsive I mean that many nodes will respond
> immediately to a 'curl localhost:9200' but a couple will not. This leads 
> to
> any cluster metrics not being available when quering the master and we're
> unable to set any cluster-level settings.
>
> We're getting a these types of errors in the logs:
> [2014-05-05 19:10:50,763][WARN ][transport.netty  ]
> [Leap-Frog] exception caught on transport layer [[id: 0x4b074069, /
> 10.6.10.211:57563 => /10.6.10.148:9300]], closing connection
> java.lang.OutOfMemoryError: Java heap space
>
> The cluster seems to never recover either - and that is my biggest
> concern. So my questions are:
> 1. Is it normal for the entire cluster to just close up shop because a
> couple nodes are unresponsive? I thought the field data circuit breaker
> would fix this, but maybe this is a different problem.
> 2. How to best get ES to recover from this scenario? I dont really
> want to restart just the two nodes, as we have >1Tb of data on each node,
> but issuing a disable_allocation fails because it cannot write to all 
> nodes
> in the cluster
>
>  --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/2fb4e427-cf95-4882-bd87-728fbfef10dd%40googlegroups.com

Re: boost query in elasticsearch

2014-05-06 Thread Hannes Korte

Hi Alexandre,

did you have a look at the cross_fields type of the multi match query?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-cross-fields

Using an edge_ngram analyzer this might already solve your problem even 
without the need for fine-tuning a boost value.


Best regards,
Hannes


On 06.05.2014 13:57, Alexandre Pinsard wrote:



Hello everybody,
i would like to use boost in the case :

I have a personal index that contains the following properties:

- sname
- fname
- login

When I do research on fname= 'toto' and sname = 't' I would like my results
are sorted by relevance. For example :

- Toto tata
- Toto tutu
- Toto tt
- Toto Tutfut
- etc...

But I do not know how to deal with boost. Can you help me please ?



--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5368D35C.60109%40hkorte.com.
For more options, visit https://groups.google.com/d/optout.


Percolator on nested object shard failure

2014-05-06 Thread Razafindramaka Rado
Hi!

I am using elasticsearch 1.1. 

Normally in this version, percolator on nested documents should work. 

Although, i am trying to do this but I get the following error: 


failures: [ 

{ 
index: test 
shard: 4 
reason: BroadcastShardOperationFailedException[[test][4] ];nested
: PercolateException[failed to percolate]; nested: 
ElasticsearchIllegalArgumentException[Nothing to percolate]; 
} 

] 



I have the following percolator (sorry elasticsearch head removed me all 
the quotes): 

  
  { 

_index: test 
_type: .percolator 
_id: 27 
_version: 1 
_score: 1 
_source: { 
query: { 
filtered: { 
query: { 
match_all: { } 
} 
filter: { 
nested: { 
filter: { 
term: { 
city: london 
} 
} 
path: location 
} 
} 
} 
} 
} 

} 



And while trying to percolate this document I am getting the error: 


{ 
  ... 
  "location": { 
"date": "2014-05-05T15:07:58", 
"namedplaces": { 
  "city": "london" 
} 
  } 
} 



Any idea why it doesn't work ? 

**EDIT :** 

In elasticsearch log I got more precision about the error: 

   
 [2014-05-06 13:33:48,972][DEBUG][action.percolate ] [Tomazooma] [
test][2], node[H42BBxajRs2w2NmllMnp7g], [P], s[STARTED]: Failed to execute [
org.elasticsearch.action.percolate.PercolateReque 
st@7399452e] 
org.elasticsearch.percolator.PercolateException: failed to percolate 
at org.elasticsearch.action.percolate.TransportPercolateAction.
shardOperation(TransportPercolateAction.java:198) 
at org.elasticsearch.action.percolate.TransportPercolateAction.
shardOperation(TransportPercolateAction.java:55) 
at org.elasticsearch.action.support.broadcast.
TransportBroadcastOperationAction$AsyncBroadcastAction$2.run(
TransportBroadcastOperationAction.java:226) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:744) 
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: 
Nothing to percolate 
at org.elasticsearch.percolator.PercolatorService.percolate(
PercolatorService.java:187) 
at org.elasticsearch.action.percolate.TransportPercolateAction.
shardOperation(TransportPercolateAction.java:194) 
... 5 more 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8ba57859-5da2-4349-85a8-de0087623ad0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Does aggregations size property work?

2014-05-06 Thread Oswaldo Dantas
Solved.

Over the irc channel the user @honzakral showed me a working example he had 
and I noticed that for aggregations, the size property cannot be quoted.

On the line of the previous examples, this works:

{
  "size": "5",
  "aggs": {
"aggtest": {
  "terms": {
"field": "somefield",
"size": 5
  }
}
  }
}

On Tuesday, May 6, 2014 1:34:10 PM UTC+2, Oswaldo Dantas wrote:
>
> Hello all,
>
> I've tested  with versions 1.0.0, 1.1.0 and 1.1.1 and the  size property 
> doesn't seam to work.
>
> The following works using facets:
> {
>   "size": "0",
>   "facets": {
> "facettest": {
>   "terms": {
> "field": "somefield",
> "size": "5"
>   }
> }
>   }
> }
>
> The response contains 5 terms as expected, but changing to aggregations:
>
> {
>   "size": "0",
>   "aggs": {
> "aggtest": {
>   "terms": {
> "field": "somefield",
> "size": "5"
>   }
> }
>   }
> }
>
> Return an 400 error message ending with "Parse Failure [Unknown key for a 
> VALUE_STRING in [aggtest]: [size].]]; }]".
>
> Is there any special requirement or configuration for this to work? The 
> documentation talks about this property since some time in 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/search-aggregations-bucket-terms-aggregation.html#_size_amp_shard_sizebut
>  doesn't actually show an usage example.
>
> Cheers,
> Oswaldo
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/99de6f9d-139f-496f-aa63-ff5be5aaeb39%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Does aggregations size property work?

2014-05-06 Thread Oswaldo Dantas
Solved.

Over the irc channel the user @honzakral showed me a working example he had 
and I noticed that for aggregations, the size property cannot be quoted.

On the line of the previous examples, this works:

{
  "size": "1",
  "aggs": {
"tags": {
  "terms": {
"field": "usrIdHost",
"size": 1
  }
}
  }
}

Opened a ticket so this inconsistency can be checked: 
https://github.com/elasticsearch/elasticsearch/issues/6061

On Tuesday, May 6, 2014 1:34:10 PM UTC+2, Oswaldo Dantas wrote:
>
> Hello all,
>
> I've tested  with versions 1.0.0, 1.1.0 and 1.1.1 and the  size property 
> doesn't seam to work.
>
> The following works using facets:
> {
>   "size": "0",
>   "facets": {
> "facettest": {
>   "terms": {
> "field": "somefield",
> "size": "5"
>   }
> }
>   }
> }
>
> The response contains 5 terms as expected, but changing to aggregations:
>
> {
>   "size": "0",
>   "aggs": {
> "aggtest": {
>   "terms": {
> "field": "somefield",
> "size": "5"
>   }
> }
>   }
> }
>
> Return an 400 error message ending with "Parse Failure [Unknown key for a 
> VALUE_STRING in [aggtest]: [size].]]; }]".
>
> Is there any special requirement or configuration for this to work? The 
> documentation talks about this property since some time in 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/search-aggregations-bucket-terms-aggregation.html#_size_amp_shard_sizebut
>  doesn't actually show an usage example.
>
> Cheers,
> Oswaldo
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2b2d5945-eedc--904c-fcf77e756459%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ES backup using EBS

2014-05-06 Thread Ankur Goel
Hi All,

We have been using an elastic search cluster with 3 nodes , running on aws 
machines , we have been using EBS for work and data directory of 
elasticsearch . I was experimenting with backup using EBS snapshots , here 
is what I did

1.) created a snapshot of one ebs volume (say, alpha)
2.) deleted the index 
3.) shutdown the cluster
4.) unmounted the ebs volume on alpha (say /mnt/data )
5.) created a new volume from snapshot and mounted on alpha in the same 
location (/mnt/data)
6.) restarted elastic search on one node only 
cluster went to red state , all shards in unassigned 
state
7.) restarted another node with blank data (/mnt/data ) directory
cluster red state , all shards STILL in UNASSIGNED STATE
8.) manually allocated shards to node
 recovered no documents , few kbs of data got restored 
:(

can anyone please help me debug this , what am I doing wrong , is there a 
better way to do this ??

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/52a4da1e-7131-4026-b644-4a73b7fe97a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


boost query in elasticsearch

2014-05-06 Thread Alexandre Pinsard


Hello everybody,
i would like to use boost in the case :

I have a personal index that contains the following properties:

   - sname
   - fname
   - login

When I do research on fname= 'toto' and sname = 't' I would like my results 
are sorted by relevance. For example :

   - Toto tata
   - Toto tutu
   - Toto tt
   - Toto Tutfut
   - etc...

But I do not know how to deal with boost. Can you help me please ?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7102e10a-a5a2-492e-8f42-7a2b4820f4d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


BackUp and restore

2014-05-06 Thread Ankur Goel
Hi All,

We have been using an elastic search cluster with 3 nodes , running on aws 
machines , we have been using EBS for work and data directory of 
elasticsearch . I was experimenting with backup using EBS snapshots , here 
is what I did

1.) created a snapshot of one ebs volume (say, alpha)
2.) deleted the index 
3.) shutdown the cluster
4.) unmounted the ebs volume on alpha (say /mnt/data )
5.) created a new volume from snapshot and mounted on alpha in the same 
location (/mnt/data)
6.) restarted elastic search on one node only 
cluster went to red state , all shards in unassigned 
state
7.) restarted another node with blank data (/mnt/data ) directory
cluster went to yellow state , all shards STILL in 
UNASSIGNED STATE
8.) manually allocated shards to node
 recovered no documents , few kbs of data got restored 
:(

can anyone please help me debug this , what am I doing wrong , is there a 
better way to do this ??

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6381f2e3-706d-4bc8-86f5-283396e37560%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Does aggregations size property work?

2014-05-06 Thread Oswaldo Dantas
Hello all,

I've tested  with versions 1.0.0, 1.1.0 and 1.1.1 and the  size property 
doesn't seam to work.

The following works using facets:
{
  "size": "0",
  "facets": {
"facettest": {
  "terms": {
"field": "somefield",
"size": "5"
  }
}
  }
}

The response contains 5 terms as expected, but changing to aggregations:

{
  "size": "0",
  "aggs": {
"aggtest": {
  "terms": {
"field": "somefield",
"size": "5"
  }
}
  }
}

Return an 400 error message ending with "Parse Failure [Unknown key for a 
VALUE_STRING in [aggtest]: [size].]]; }]".

Is there any special requirement or configuration for this to work? The 
documentation talks about this property since some time in 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/search-aggregations-bucket-terms-aggregation.html#_size_amp_shard_size
 
but doesn't actually show an usage example.

Cheers,
Oswaldo

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/126c98a7-9d6e-4709-b6d9-8fe11e0c3b5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: More like this scoring algorithm unclear

2014-05-06 Thread Alex Ksikes
Hi Maarten,

Your 'like_text' is analyzed, the same way your 'product_id' field is 
analyzed, unless specified by 'analyzer'. I would recommend setting 
'percent_terms_to_match' to 0. However, if you are only searching over 
product ids then a simple boolean query would do. If not, then I would 
create a boolean query where each clause is a 'more like this field' for 
each field of the queried document. This is actually what the mlt API does.

Cheers,

Alex

On Wednesday, January 8, 2014 7:20:05 PM UTC+1, Maarten Roosendaal wrote:
>
> scoring algorithm is still vague but i got the query to act like the API, 
> although the results are different so i'm still doing it wrong, here's an 
> example:
> {
>   "explain": true,
>   "query": {
> "more_like_this": {
>   "fields": [
> "PRODUCT_ID"
>   ],
>   "like_text": "104004855475 1001004002067765 100200494210 
> 1002004004499883",
>   "min_term_freq": 1,
>   "min_doc_freq": 1,
>   "max_query_terms": 1,
>   "percent_terms_to_match": 0.5
> }
>   },
>   "from": 0,
>   "size": 50,
>   "sort": [],
>   "facets": {}
> }
>
> the like_text contains product_id's from a wishlist for which i want to 
> find similair lists
>
> Op woensdag 8 januari 2014 16:50:53 UTC+1 schreef Maarten Roosendaal:
>>
>> Hi,
>>
>> Thanks, i'm not quite sure how to do that. I'm using:
>> http://localhost:9200/lists/list/[id of 
>> list]/_mlt?mlt_field=product_id&min_term_freq=1&min_doc_freq=1
>>
>> the body does not seem to be respected (i'm using the elasticsearch head 
>> plugin) if i ad:
>> {
>>   "explain": true
>> }
>>
>> i've been trying to rewrite the mlt api as an mlt query but no luck so 
>> far. Any suggestions?
>>
>> Thanks,
>> Maarten
>>
>> Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher:
>>>
>>> Hey Maarten,
>>>
>>> I would use the "explain":true option to see just why your documents are 
>>> being scored higher than others. MoreLikeThis using the same fulltext 
>>> scoring as far as I know, so term position would affect score. 
>>>
>>>
>>> http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html
>>>
>>> Justin
>>>
>>> On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote:

 Hi,

 I have a question about why the 'more like this' algorithm scores 
 documents higher than others, while they are (at first glance) the same.

 What i've done is index wishlist-documents which contain 1 property: 
 product_id, this property contains an array of product_id's (e.g. [1234, 
 , , ]. What i'm trying to do is find similair wishlist for a 
 given wishlist with id x. The MLT API seems to work, it returns other 
 documents which contain at least 1 of the product_id's from the original 
 list.

 But what is see is that, for example. i get 10 hits, the first 6 hits 
 contain the same (and only 1) product_id, this product_id is present in 
 the 
 original wishlist. What i would expect is that the score of the first 6 is 
 the same. However what i see is that only the first 2 have the same, the 
 next 2 a lower score and the next 2 even lower. Why is this?

 Also, i'm trying to write the MLT API as an MLT query, but somehow it 
 doesn't work. I would expect that i need to take the entire content of the 
 original product_id property and feed is as input for the 'like_text'. The 
 documentation is not very clear and doesn't provide examples so i'm a 
 little lost.

 Hope someone can give some pointers.

 Thanks,
 Maarten

>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/91734252-74d0-4001-becc-a184af0f2997%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Looking for ideas how to sort by a field that looks like 400234.12.4

2014-05-06 Thread David Pilato
I meant sort script: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html#_script_based_sorting

although score script should help as well.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 6 mai 2014 à 13:08:05, David Pilato (da...@pilato.fr) a écrit:

may be using a script score function? 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#_script_score

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 6 mai 2014 à 11:46:51, mooky (nick.minute...@gmail.com) a écrit:

I have a reference field that is pseudo numeric. Sorting alphabetically 
produces the wrong results.

e.g. the following values should be sorted like this:
3045.5.2
3045.5.12
3045.15.2

I can't change the reference format - to pad it with zeros.
But I did think I could create a field that was padded with zeros - and sort 
using that.
I was hoping I could keep it out of the source document - but I haven't figured 
out how I can do some scripted formatting yet - using perhaps a multi-field.

I would greatly appreciate any cunning tricks that could be used to get this 
done.


--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/34123a15-e8d3-4191-b866-671064420f44%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.5368c2d6.62bbd95a.5c36%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: Looking for ideas how to sort by a field that looks like 400234.12.4

2014-05-06 Thread David Pilato
may be using a script score function? 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#_script_score

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 6 mai 2014 à 11:46:51, mooky (nick.minute...@gmail.com) a écrit:

I have a reference field that is pseudo numeric. Sorting alphabetically 
produces the wrong results.

e.g. the following values should be sorted like this:
3045.5.2
3045.5.12
3045.15.2

I can't change the reference format - to pad it with zeros.
But I did think I could create a field that was padded with zeros - and sort 
using that.
I was hoping I could keep it out of the source document - but I haven't figured 
out how I can do some scripted formatting yet - using perhaps a multi-field.

I would greatly appreciate any cunning tricks that could be used to get this 
done.


--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/34123a15-e8d3-4191-b866-671064420f44%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.5368c292.431bd7b7.5c36%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: Index Creation Failure

2014-05-06 Thread nicktgr15
Thanks for the response Mark. Our 6 data nodes are m1.large EC2 instances. 


We could probably add more data-nodes and see if there is any improvement, 
however it's interesting that we create this high number of new indices 
only at midnight. 
Generally, from the performance/load stats we've seen so far I don't think 
we need more data-nodes during the day. It feels like increasing the number 
of data-nodes just to address this issue would be a waste of resources.

I'm also adding some log lines related to the error messages we see on the 
flume elasticsearch-sink side.

05 May 2014 00:00:12,850 INFO  [elasticsearch[Aldebron][generic][T#1]] 
(org.elasticsearch.common.logging.log4j.Log4jESLogger.internalInfo:119)  - 
[Aldebron] failed to get node info for 
[#transport#-1][ip-10-0-235-53.eu-west-1.compute.internal][inet[/10.0.238.47:9300]],
 
disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[/10.0.
238.47:9300]][cluster/nodes/info] request_id [107512] timed out after [
5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(
TransportService.java:356)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)


On Tuesday, May 6, 2014 11:07:59 AM UTC+1, Mark Walkom wrote:
>
> That's a massive amount of indexes to create at the one time, and to have 
> on your cluster, so it's no surprise it's timing out.
>
> How big are your nodes? Can you add a few more or collapse your index 
> count?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 6 May 2014 19:48, nicktgr15 > wrote:
>
>> Hello,
>>
>> We are having an issue with elasticsearch index creation. In our cluster 
>> we create new indices everyday at midnight and at the moment we create 
>> about 150 new indices every time. 
>> Lately we have started getting log lines like the following during index 
>> creation:
>>
>> [2014-05-05 00:00:35,596][DEBUG][action.admin.indices.create] [Amber Hunt
>> ] [indexname-2014-05-05] failed to create
>> org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed
>>  to process cluster 
>> event (create-index [indexname-2014-05-05], cause [auto(bulk api)])within 
>> 30s
>> at org.elasticsearch.cluster.service.InternalClusterService$2$1.
>> run(InternalClusterService.java:248)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1145)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>>
>> It's not very clear what this error means and how we could stop seeing 
>> this in our logs. Is there a config parameter for the timeout value of bulk 
>> requests? (if that's the problem)
>>
>> Our cluster at the moment has the following stats and we are using 
>> elasticsearch 1.1.1:
>>
>> 12  Nodes (3 master, 6 data, 3 search)
>> 13,140 Total Shards
>> 13,140 Successful Shards
>> 2,196 Indices
>> 434,248,844 Documents
>> 194.8GB Size
>>
>> We have noticed that around the same time we see the above "failed to 
>> create" message, flume elasticsearh-sink (used on the client side) stops 
>> working, so we are trying to understand if there is any correlation between 
>> these two events (index creation failure, flume elasticsearch-sink failure).
>>
>> Any help/suggestions would be appreciated!
>>
>> Regards,
>> Nick
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/79eaa859-5583-40ea-8af7-9f5e5878d298%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/371e7a27-a045-4ccf-a7f5-7e9eb766db8d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Interesting Terms for MoreLikeThis Query in ElasticSearch

2014-05-06 Thread Alex Ksikes
You could always use explain to find out the best matching terms of any 
query. In order to get all the interesting terms, you could run a query 
where the top result document has matched itself.

Also the new significant terms might be of interest to you:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html

On Thursday, January 30, 2014 9:59:02 PM UTC+1, api...@clearedgeit.com 
wrote:
>
> I have been trying to figure out how to get interesting terms using the 
> MLT query.  Does ElasticSearch have this functionality similar to solr or 
> if not, is there a work around?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/201edd47-d5d1-4fcf-a520-184737b6b7ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elastic Search MLT API, how to use fields with weights.

2014-05-06 Thread Alex Ksikes
I'd like to add to this that mlt API is the same as a boolean query DSL 
made of multiple more like this field clauses, where each field is set to 
the content of the field of the queried document.

On Thursday, February 20, 2014 4:20:36 PM UTC+1, Binh Ly wrote:
>
> I do not believe you can boost individual fields/terms separately in a MLT 
> query. Your best bet is to probably run a bool query of multiple MLT 
> queries each with a different field and boost, but you'll need to first 
> extract the MLT text before you can do this.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fcd0453-58dc-4a66-b7d9-2e785a2a7fa6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Significant Term aggregation

2014-05-06 Thread Mark Harwood
Hi Ramdev,


>>Is tehre other functionality (experimental or otherwise) within ES that 
can help me do this ?

I'd recommend splitting HTML files that are clearly referencing multiple 
diverse news stories into multiple ES documents based on title headings or 
whatever indicates the start/end of each news item.

For boilerplate-removal I have previously used this analyzer on an earlier 
incarnation of the significant_terms algo: 
 https://issues.apache.org/jira/browse/LUCENE-725

Cheers
Mark

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ae098032-ac92-4de3-a0f5-681d3b4c1031%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: MoreLikeThis ignores queries?

2014-05-06 Thread Alex Ksikes
Hello Alexey,

You should use the query DSL and not the more like this API. You can create 
a boolean query where one clause is your more like this query and the other 
one is your ignore category query (better use a filter here if you can). 

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html

However, more like this of the DSL only takes a like_text parameter, you 
cannot pass the id of the document. This will change in a subsequent 
version of ES. For now, to simulate this functionality, you can use 
multiple mlt queries with a like_text set to the value of each field of the 
queried document, inside a boolean query. Let me know if this helps.

Alex

On Wednesday, March 19, 2014 5:01:06 AM UTC+1, Alexey Bagryancev wrote:
>
> Anyone can help me? It really does not work...
>
> среда, 19 марта 2014 г., 2:05:49 UTC+7 пользователь Alexey Bagryancev 
> написал:
>>
>> Hi,
>>
>> I am trying to filter moreLikeThis results by adding additional query - 
>> but it seems to ignore it at all.
>>
>> I tried to run my ignoreQuery separately and it works fine, but how to 
>> make it work with moreLikeThis? Please help me.
>>
>> $ignoreQuery = $this->IgnoreCategoryQuery('movies')
>>
>>
>>
>> $this->resultsSet = $this->index->moreLikeThis(
>>new \Elastica\Document($id), 
>>array_merge($this->mlt_fields, array('search_size' => $this->
>> size, 'search_from' => $this->from)), 
>>$ignoreQuery);
>>
>>
>>
>> My IgnoreCategory function:
>>
>> public function IgnoreCategoryQuery($category = 'main') 
>> { 
>>  $categoriesTermQuery = new \Elastica\Query\Term();
>>  $categoriesTermQuery->setTerm('categories', $category);
>>  
>>  $categoriesBoolQuery = new \Elastica\Query\Bool(); 
>>  $categoriesBoolQuery->addMustNot($categoriesTermQuery);
>>  
>>  return $categoriesBoolQuery;
>> }
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e605d6e2-b42b-4661-b819-90735a9581ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Full cluster restart command

2014-05-06 Thread Mark Walkom
Check out
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades
While you're not upgrading anything at this time, setting
cluster.routing.allocation.enable can dramatically reduce recovery time.

Otherwise, it depends on your cluster setup. How many nodes, any
master/data/search only?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 6 May 2014 19:25, Ben Langers  wrote:

> Hello
>
> I can bring down all my nodes with the following command:
> $ curl -XPOST 'http://localhost:9200/_cluster/nodes/_all/_shutdown'
>
> However, what's the right way to do a full cluster restart?
>
> Regards.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/9e88caf8-0122-4821-a154-a4c05fe8a2cb%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ajins9oGsJc%3DGn%2BC9dhygBS08kh4EQ4r_t0Kd-Errn6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Index Creation Failure

2014-05-06 Thread Mark Walkom
That's a massive amount of indexes to create at the one time, and to have
on your cluster, so it's no surprise it's timing out.

How big are your nodes? Can you add a few more or collapse your index count?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 6 May 2014 19:48, nicktgr15  wrote:

> Hello,
>
> We are having an issue with elasticsearch index creation. In our cluster
> we create new indices everyday at midnight and at the moment we create
> about 150 new indices every time.
> Lately we have started getting log lines like the following during index
> creation:
>
> [2014-05-05 00:00:35,596][DEBUG][action.admin.indices.create] [Amber Hunt]
> [indexname-2014-05-05] failed to create
> org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed 
> to process cluster
> event (create-index [indexname-2014-05-05], cause [auto(bulk api)])within
> 30s
> at org.elasticsearch.cluster.service.InternalClusterService$2$1.
> run(InternalClusterService.java:248)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
> It's not very clear what this error means and how we could stop seeing
> this in our logs. Is there a config parameter for the timeout value of bulk
> requests? (if that's the problem)
>
> Our cluster at the moment has the following stats and we are using
> elasticsearch 1.1.1:
>
> 12  Nodes (3 master, 6 data, 3 search)
> 13,140 Total Shards
> 13,140 Successful Shards
> 2,196 Indices
> 434,248,844 Documents
> 194.8GB Size
>
> We have noticed that around the same time we see the above "failed to
> create" message, flume elasticsearh-sink (used on the client side) stops
> working, so we are trying to understand if there is any correlation between
> these two events (index creation failure, flume elasticsearch-sink failure).
>
> Any help/suggestions would be appreciated!
>
> Regards,
> Nick
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/79eaa859-5583-40ea-8af7-9f5e5878d298%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YY3OcGuWQFw38FXd%2BkuXyPU6ecPAmnew8z1h8NagQ-Jw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: more like this on numbers

2014-05-06 Thread Alex Ksikes
Hi Valentin,

As you know, you can only perform mlt on fields which are analyzed. 
However, you can convert your other fields (number, ..) to text using a 
multi field with type string at indexing time.

Cheers,

Alex

On Thursday, March 27, 2014 4:31:58 PM UTC+1, Valentin wrote:
>
> Hi,
>
> as far as I understand it the more like this query allows to find 
> documents where the same tokens are used. I wonder if there is a 
> possibility to find documents where a particular field is compared based on 
> its value (number).
>
> Regards
> Valentin
>
> PS: elasticsearch rocks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d9666a82-cc51-4890-a45f-d695f0600b5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Official .NET client

2014-05-06 Thread Loïc Wenkin
Hello everybody,

I watched a video 2 or 3 months ago (Facebook and Elasticsearch), and in 
this video, it was said that it was planned to develop an official .NET 
client. Do you have some news about it ? Is there a roadmap (or at least, 
an idea about a release date (2014, 2015 ...)) for this client ?
Currently, I am using PlainElastic.Net which is a great client (I like the 
idea to work with strings directly accessible to user, allowing us to 
easily debug queries), but some features are missing (I think to 
aggregations, for example, or a kind of integrated failover system).

Any news about it would be appreciated :)

Regards,
Loïc

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4da2e7e7-39bb-447e-a7fa-d5dc60e0cd9b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Index Creation Failure

2014-05-06 Thread nicktgr15
Hello,

We are having an issue with elasticsearch index creation. In our cluster we 
create new indices everyday at midnight and at the moment we create about 
150 new indices every time. 
Lately we have started getting log lines like the following during index 
creation:

[2014-05-05 00:00:35,596][DEBUG][action.admin.indices.create] [Amber Hunt] [
indexname-2014-05-05] failed to create
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed 
to process cluster 
event (create-index [indexname-2014-05-05], cause [auto(bulk api)]) within 
30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(
InternalClusterService.java:248)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

It's not very clear what this error means and how we could stop seeing this 
in our logs. Is there a config parameter for the timeout value of bulk 
requests? (if that's the problem)

Our cluster at the moment has the following stats and we are using 
elasticsearch 1.1.1:

12  Nodes (3 master, 6 data, 3 search)
13,140 Total Shards
13,140 Successful Shards
2,196 Indices
434,248,844 Documents
194.8GB Size

We have noticed that around the same time we see the above "failed to 
create" message, flume elasticsearh-sink (used on the client side) stops 
working, so we are trying to understand if there is any correlation between 
these two events (index creation failure, flume elasticsearch-sink failure).

Any help/suggestions would be appreciated!

Regards,
Nick

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79eaa859-5583-40ea-8af7-9f5e5878d298%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Looking for ideas how to sort by a field that looks like 400234.12.4

2014-05-06 Thread mooky
I have a reference field that is pseudo numeric. Sorting alphabetically 
produces the wrong results.

e.g. the following values should be sorted like this:
3045.5.2
3045.5.12
3045.15.2

I can't change the reference format - to pad it with zeros.
But I did think I could create a field that was padded with zeros - and 
sort using that.
I was hoping I could keep it out of the source document - but I haven't 
figured out how I can do some scripted formatting yet - using perhaps a 
multi-field.

I would greatly appreciate any cunning tricks that could be used to get 
this done.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/34123a15-e8d3-4191-b866-671064420f44%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation bug? Or user error?

2014-05-06 Thread mooky
I am using elastic 1.1.1.
The index isn't huge (600m) - but it contains financially sensitive data... 
will be too problematic legally to allow it offsite. I can try anonymise 
the data - see if it can be reproduced that way - might learn something 
about what is causing it.





On Friday, 2 May 2014 14:34:21 UTC+1, Adrien Grand wrote:
>
> What version of Elasticsearch are you using? If it is small enough, I 
> would also be interested if you could share your index so that I can try to 
> reproduce the issue locally.
>
>
> On Fri, May 2, 2014 at 12:07 PM, mooky 
> > wrote:
>
>>  
>> I havent been able to figure out what is required to recreate it.
>> I am doing a number of identical aggregations (just different values 
>> intentMarketCode 
>> and intentDate
>> Three aggregations give correct numbers - one doesnt I havent figured 
>> why
>>  
>>
>> On Wednesday, 30 April 2014 14:13:00 UTC+1, Adrien Grand wrote:
>>
>>> This looks wrong indeed. By any chance, would you have a curl recreation 
>>> of this issue?
>>>
>>>
>>> On Tue, Apr 29, 2014 at 7:35 PM, mooky  wrote:
>>>
 It looks like a bug to me - but if its user error, then obviously I can 
 fix it a lot quicker :)
  

 On Tuesday, 29 April 2014 13:04:53 UTC+1, mooky wrote:

>  I am seeing some very odd aggregation results - where the sum of the 
> sub-aggregations is more than the parent bucket.
>
> Results:
> "CSSX" : {
>   "doc_count" : *24*,
>   "intentDate" : {
> "buckets" : [ {
>   "key" : "Overdue",
>   "to" : 1.3981248E12,
>   "to_as_string" : "2014-04-22",
>   "doc_count" : *1*,
>   "ME" : {
> "doc_count" : *0*
>   },
>   "NOT_ME" : {
> "doc_count" : *24*
>   }
> }, {
>   "key" : "May",
>   "from" : 1.3981248E12,
>   "from_as_string" : "2014-04-22",
>   "to" : 1.4006304E12,
>   "to_as_string" : "2014-05-21",
>   "doc_count" : *23*,
>   "ME" : {
> "doc_count" : 0
>   },
>   "NOT_ME" : {
> "doc_count" : *24*
>   }
> }, {
>   "key" : "June",
>   "from" : 1.4006304E12,
>   "from_as_string" : "2014-05-21",
>   "to" : 1.4033088E12,
>   "to_as_string" : "2014-06-21",
>   "doc_count" : *0*,
>   "ME" : {
> "doc_count" : *0*
>   },
>   "NOT_ME" : {
> "doc_count" : *24*
>   }
> } ]
>   }
> },
>
>
> I wouldn't have thought that to be possible at all.
> Here is the request that generated the dodgy results.
>
>
> "CSSX" : {
>   "filter" : {
> "and" : {
>   "filters" : [ {
> "type" : {
>   "value" : "inventory"
> }
>   }, {
> "term" : {
>   "isAllocated" : false
> }
>   }, {
> "term" : {
>   "intentMarketCode" : "CSSX"
> }
>   }, {
> "terms" : {
>   "groupCompanyId" : [ "0D13EF2D0E114D43BFE362F5024D8873", 
> "0D593DE0CFBE49BEA3BF5AD7CD965782", "1E9C36CC45C64FCAACDEE0AF4FB91FBA"
> , "33A946DC2B0E494EB371993D345F52E4", "6471AA50DFCF4192B8DD1C2E72A032
> C7", "9FB2FFDC0FF0797FE04014AC6F0616B6", "
> 9FB2FFDC0FF1797FE04014AC6F0616B6", "9FB2FFDC0FF2797FE04014AC6F0616B6", 
> "9FB2FFDC0FF3797FE04014AC6F0616B6", "9FB2FFDC0FF5797FE04014AC6F0616B6"
> , "9FB2FFDC0FF6797FE04014AC6F0616B6", "AFE0FED33F06AFB6E04015AC5E060A
> A3" ]
> }
>   }, {
> "not" : {
>   "filter" : {
> "terms" : {
>   "status" : [ "Cancelled", "Completed" ]
> }
>   }
> }
>   } ]
> }
>   },
>   "aggregations" : {
> "intentDate" : {
>   "date_range" : {
> "field" : "intentDate",
> "ranges" : [ {
>   "key" : "Overdue",
>   "to" : "2014-04-22"
> }, {
>   "key" : "May",
>   "from" : "2014-04-22",
>   "to" : "2014-05-21"
> }, {
>   "key" : "June",
>   "from" : "2014-05-21",
>   "to" : "2014-06-21"
> } ]
>   },
>   "aggregations" : {
> "ME" : {
>   "filter" : {
> "term" : {
>
>   "trafficOperatorSid" : "S-1-5-21-20xx" style="color: #

Re: Need help on similarity ranking approach

2014-05-06 Thread Alex Ksikes
Hello,

What you want to know is the score of the document that has matched itself 
using more like this. The API excludes the queried document. However, it is 
equivalent to running a boolean query of more like this field for each of 
the queried document field. This will give you as top result, the document 
that has matched itself, so that you can compute the percentage of 
similarity of the remaining matched documents.

Alex

On Friday, May 2, 2014 3:22:34 PM UTC+2, Rgs wrote:
>
> Thanks Binh Ly and Ivan Brusic for your replies. 
>
> I need to find the similarity in percentage of a document against other 
> documents and this will be considered for grouping the documents. 
>
> is it possible to get the similarity percentage using more like this 
> query? 
> or is any other way to calculate the percentage of similarity from the 
> query 
> result? 
>
> Eg:  document1 is 90% similar to document2. 
>   document1 is 45% similar to document3 
>   etc.. 
>
> Thanks 
>
>
>
> -- 
> View this message in context: 
> http://elasticsearch-users.115913.n3.nabble.com/Need-help-on-similarity-ranking-approach-tp4054847p4055227.html
>  
> Sent from the ElasticSearch Users mailing list archive at Nabble.com. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/05db016b-1c2e-497c-9275-37dcccedfae3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Full cluster restart command

2014-05-06 Thread Ben Langers
Hello

I can bring down all my nodes with the following command:
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_all/_shutdown'

However, what's the right way to do a full cluster restart?

Regards.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9e88caf8-0122-4821-a154-a4c05fe8a2cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: MoreLikeThis can't identify that 2 documents with exactly same attachments are duplicates

2014-05-06 Thread Alex Ksikes
Hi Zoran,

If you are looking for exact duplicates then hashing the file content, and 
doing a search for that hash would do the job. If you are looking for near 
duplicates, then I would recommend extracting whatever text you have in 
your html, pdf, doc, indexing that and running more like this with 
like_text set to that content. Additionally you can perform a mlt search on 
more fields including the meta-data fields extracted with the attachment 
plugin. Hope this helps.

Alex

On Monday, May 5, 2014 8:08:30 PM UTC+2, Zoran Jeremic wrote:
>
> Hi Alex,
>
> Thank you for your explanation. It makes sense now. However, I'm not sure 
> I understood your proposal. 
>
> So I would adjust the mlt_fields accordingly, and possibly extract the 
> relevant portions of texts manually
> What do you mean by adjusting mlt_fields? The only shared field that is 
> guaranteed to be same is file. Different users could add different titles 
> to documents, but attach same or almost the same documents. If I compare 
> documents based on the other fields, it doesn't mean that it will match, 
> even though attached files are exactly the same.
> I'm also not sure what did you mean by extract the relevant portions of 
> text manually. How would I do that and what to do with it?
>
> Thanks,
> Zoran
>  
>
> On Monday, 5 May 2014 01:23:49 UTC-7, Alex Ksikes wrote:
>>
>> Hi Zoran,
>>
>> Using the attachment type, you can text search over the attached document 
>> meta-data, but not its actual content, as it is base 64 encoded. So I would 
>> adjust the mlt_fields accordingly, and possibly extract the relevant 
>> portions of texts manually. Also set percent_terms_to_match = 0, to ensure 
>> that all boolean clauses match. Let me know how this works out for you.
>>
>> Cheers,
>>
>> Alex
>>
>> On Monday, May 5, 2014 5:50:07 AM UTC+2, Zoran Jeremic wrote:
>>>
>>> Hi guys,
>>>
>>> I have a document that stores a content of html file, pdf, doc  or other 
>>> textual document in one of it's fields as byte array using attachment 
>>> plugin. Mapping is as follows:
>>>
>>> { "document":{
>>> "properties":{
>>>  "title":{"type":"string","store":true },
>>>  "description":{"type":"string","store":"yes"},
>>>  "contentType":{"type":"string","store":"yes"},
>>>  "url":{"store":"yes", "type":"string"},
>>>   "visibility": { "store":"yes", "type":"string"},
>>>   "ownerId": {"type": "long",   "store":"yes" },
>>>   "relatedToType": { "type": "string", "store":"yes" },
>>>   "relatedToId": {"type": "long", "store":"yes" },
>>>   "file":{
>>> "path": "full","type":"attachment",
>>> "fields":{
>>> "author": { "type": "string" },
>>> "title": { "store": true,"type": "string" },
>>> "keywords": { "type": "string" },
>>> "file": { "store": true, "term_vector": 
>>> "with_positions_offsets","type": "string" },
>>> "name": { "type": "string" },
>>> "content_length": { "type": "integer" },
>>> "date": { "format": "dateOptionalTime", "type": 
>>> "date" },
>>> "content_type": { "type": "string" }
>>> }
>>> }}
>>> And the code I'm using to store the document is:
>>>
>>> VisibilityType.PUBLIC
>>>
>>> These files seems to be stored fine and I can search content. However, I 
>>> need to identify if there are duplicates of web pages or files stored in 
>>> ES, so I don't return the same documents to the user as search or 
>>> recommendation result. My expectation was that I could use MoreLikeThis 
>>> after the document was indexed to identify if there are duplicates of that 
>>> document and accordingly to mark it as duplicate. However, results look 
>>> weird for me, or I don't understand very well how MoreLikeThis works.
>>>
>>> For example, I indexed web page http://en.wikipedia.org/wiki/Linguistics3 
>>> times, and all 3 documents in ES have exactly the same binary content 
>>> under file. Then for the following query:
>>>
>>> http://localhost:9200/documents/document/WpkcK-ZjSMi_l6iRq0Vuhg/_mlt?mlt_fields=file&min_doc_freq=1
>>> where ID is id of one of these documents I got these results:
>>> http://en.wikipedia.org/wiki/Linguistics with score 0.6633003
>>> http://en.wikipedia.org/wiki/Linguistics with score 0.6197818
>>> http://en.wikipedia.org/wiki/Computational_linguistics with score 
>>> 0.48509508
>>> ...
>>>
>>> For some other examples, scores for the same documents are much lower, 
>>> and sometimes (though not that often) I don't get duplicates on the first 
>>> positions. I would expect here to have score 1.0 or higher for documents 
>>> that are exactly the same, but it's not the case, and I can't figure out 
>>> how could I identify if there are duplicates in the Elasticsearch index.
>>>
>>> I 

Re: Nest Custom analyser

2014-05-06 Thread Marcio Rodrigues
Thanks, I know the other ways of doing it, I was more specifically 
interested in doing it on a field within a Poco via the  NEST api.

On Monday, May 5, 2014 6:25:40 PM UTC+1, Itamar Syn-Hershko wrote:
>
> There are 2 ways to define a custom analyzer - one is via configuration 
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-analyzers.html,
>  and the other is via code.
>
> If your custom analyzer is written in code, it will be Java code that has 
> to be deployed to ES as a plugin while your code is a client written in 
> .NET, so no
>
> Otherwise, you can define the analyzer via index settings from client code 
> as well
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko 
> Freelance Developer & Consultant
> Author of RavenDB in Action 
>
>
> On Mon, May 5, 2014 at 8:19 PM, Marcio Rodrigues 
> 
> > wrote:
>
>> Ah ok, so if i undesratd correctly, there is no way to define a custom 
>> analyser in code and use that to to index a poco?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/69b6c529-3e55-472e-92e0-79c4c2f01ce4%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f19491d-aad3-45be-8ade-0518f2695ef0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


RE: External datasets in ES

2014-05-06 Thread Michael . OBrien
I was wondering if you thought tribe nodes and a separate cluster for the 
open-geo data and another for the logstash would be a way to go ?
http://www.elasticsearch.org/blog/tribe-node/

It would from my reading of the post above add a small bit more to the 
configuration but would allow separation of data updates or have I 
misunderstood the concept of tribes?
From: elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] On 
Behalf Of Alexander Reelsen
Sent: 05 May 2014 12:17
To: elasticsearch@googlegroups.com
Subject: Re: External datasets in ES

Hey,

this would be a bit more tricky, as it requires you to merge two events (the 
external dataset and your live visitor stats) into a single event as a sort of 
preprocessing step. I think I would start with the geoip support from logstash 
and use your apache logs, which at least allows you to filter by city.

Need to think about this a bit more, how to merge this kind of information.


--Alex

On Thu, May 1, 2014 at 3:19 PM, 
mailto:michael.obr...@ul.ie>> wrote:
>From reading 
>http://www.elasticsearch.org/blog/enriching-searches-open-geo-data/ I have a 
>few questions I hope the community might be able to answer

The post uses an open dataset in a static csv to map German cities meeting 
certain conditions in Kibana as an example

I was wondering if its possible to take that idea and

  1.  Combine an static csv dataset with other ES data so sticking with the 
Cities example I would be able to live map the visitors to my german website 
from cities with populations > 100k from the same ES cluster and ideally the 
same kibana interface
  2.  If it is possible how do I then update the population details when a 
newer version of the dataset is available without ending up with 2 of every 
German city with possibly conflicting population values
Any ideas?
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/76a076dd-d839-4b30-bed6-f11c2577550d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google 
Groups "elasticsearch" group.
To unsubscribe from this topic, visit 
https://groups.google.com/d/topic/elasticsearch/8gPxfa9qENM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9niCM-EJ0oYUjv9siUtUB0d2EikJOvwiOAndL7ZZyLUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/36667CDCAAF70140AE7738BB93CA8C9606555A%40ExMbx1.ul.campus.
For more options, visit https://groups.google.com/d/optout.


shard routing allocation excludes

2014-05-06 Thread Michael Salmon
Recently I tried cluster.routing.allocation.exclude._ip and it worked once 
I had actually set the IP address, I had presumed that ES would look up the 
address for me. I also tried setting the parameter to 2 names in an array 
and while this was accepted it didn't seem to do anything.

In the end I decided to look at the source code and I found that as well as 
_ip there was _host, _name and _id which pleased me as I plan to have 
multiple nodes per host.

I wonder though if these are new or deprecated.

/Michael

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e3ad7bb3-1ba6-4002-9bb3-64271a303a4d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Questioins about setting up eshadoop

2014-05-06 Thread Costin Leau
The NCDFE is an indication that the es-hadoop jar is not included in your jar. I recommend reading the official 
documentation

[1] which explains each integration in detail, including Map/Reduce.

[1] 
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/reference.html

P.S. Make sure you look at the configuration section as well since you are using the old 
format of "es.resource"

On 5/6/14 8:19 AM, Himanshu Gautam wrote:


I am getting similar error with :

ES : 1.1.1

Hadoop: Hadoop 1.0.2
JDK: 1.6.0_24
elasticsearch-hadoop-1.3.0.M3
/hadoopApps/wordcount$ hadoop jar WordCount.jar WordCount 
/user/himanshu/inputdir
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/elasticsearch/hadoop/mr/EsOutputFormat
 at WordCount.main(WordCount.java:72)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: 
org.elasticsearch.hadoop.mr.EsOutputFormat
 at java.net.UR

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/339c408f-115f-4d27-ba81-079165a420bb%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53688F60.8040601%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Problem of sync elasticsearch data after failover

2014-05-06 Thread Mark Walkom
Ahh! Oh well :)

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 6 May 2014 16:26, David Pilato  wrote:

> Sorry Mark.
>
> I guess it's my fault. I was looking at Google groups SPAM list today and
> I did not see that this message was already sent.
>
> I think he tried to send more than one message as he did not see it in the
> ML.
>
>
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 6 mai 2014 à 08:15, Mark Walkom  a écrit :
>
> Didn't you already ask this in this exact same thread?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 4 May 2014 11:25, fclddcn  wrote:
>
>> Hi everyone, I am new to elasticsearch. I'v setted up a small cluster
>> (with
>> version 0.90.1).
>> Within the cluster I have 4 nodes total:
>>
>> loadblancer node 1: on app1 server
>> loadblancer node 2: on app2 server
>>  master node 1: stand alone server 1
>>  master node 2: stand alone server 2
>>
>> This architecture is come from the idea in this discussion:
>>
>> http://stackoverflow.com/questions/17596182/connecting-to-elasticsearch-cluster-in-nest
>> <
>> http://stackoverflow.com/questions/17596182/connecting-to-elasticsearch-cluster-in-nest
>> >
>> See Duc.Duong's reply
>>
>> each master node have config as:
>> node.master: true
>> node.data: true
>> discovery.zen.ping.multicast.enabled: false
>> discovery.zen.ping.unicast.hosts:
>> [":9300",":9300"]
>>
>> I can get cluster have green status now. For me, the most important thing
>> is
>> failover, and now this is basicly avaliable.
>> That is to say, I can shut down one master (using command "service
>> elasticsearch stop") and ES still work correctly, and when I start that
>> master again,
>> it joins the cluster as a new candidate master, the status goes green, and
>> data gets synced.
>>
>> But,in extreme situation, if I did not start that master again, instead:
>> I shutdown the remaining master after inserting some documents, then,
>> I start the 2 masters, status goes green again, but the recently inserted
>> documents won't be synced.
>>
>> How can I get these documents synced?
>> I'll thanks for any ideas.
>>
>>
>>
>> --
>> View this message in context:
>> http://elasticsearch-users.115913.n3.nabble.com/Problem-of-sync-elasticsearch-data-after-failover-tp4055275.html
>> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/1399166753095-4055275.post%40n3.nabble.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAEM624bgMTftOOPxYd5oGf%3DMPKpY3MGLX%3DDA8gwFFsJfPFwzuQ%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/BBD9D042-9282-4E42-9C6C-FFD766F888D1%40pilato.fr
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Zi_HC4zeRUH-2U%2B0F9Yb1eNXRPjVyiqE4XJKysYsJxSg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Hadoop] Writing directly to shards in EsOutputFormat and shard awareness

2014-05-06 Thread Costin Leau

To reiterate on my previous email; by talking directly to primaries, assuming a 
uniform distribution of documents, we have
at least an 1/x of the total documents sent (where X is the number of shards) without proxying, while the rest will be 
proxied

to the respective shards (why we don't compute the target shard directly is 
part of the previous email).
Talking to a non-primary node would guarantee proxying for all of the documents 
being sent.

On 5/6/14 6:15 AM, Ashwin Jayaprakash wrote:

I'm not sure I understand this - "/write request can be send to any node, which 
in turn will do proxying, we can avoid
this and only hit the primaries. This avoids the proxying, rerouting/".

Even if you hit a "primary", ES will still have to re-route the document to 
"the primary shard handling the hash of the
doc" which could very well be on a different node. Isn't this true? Like any 
other distributed system. So, how do you
save that extra hop as you claim "/avoids the proxying, rerouting/". What you 
described seems to directly conflict what
is described here unless I'm mistaken:

  * 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/docs-index_.html#index-routing
  * 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distrib-write.html


Regards.


On Sunday, May 4, 2014 11:19:30 AM UTC-7, Costin Leau wrote:

n 1. Performance reasons
While a write request can be send to any node, which in turn will do 
proxying, we can avoid this and only hit the
primaries. This avoids the proxying, rerouting. Note that each task that is 
writing is assigned a different primary
in a round-robin fashion - so effectively the write happens in parallel 
across the primaries for the target index.

2. What exactly are you trying to achieve? A map-reduce job ends up with 
multiple tasks hitting an ES cluster; if
you pick only one node, you're likely to overwhelm it while the rest of the 
cluster will be idle. If you spread the
load randomly, the nodes will re-route the calls the primaries first and 
then the replicas (depending on your settings).
By talking directly to the primaries, you reduce unnecessary IO and CPU 
(caused by the proxying and hashing) and you
get better through-put and negotiation between ES and Hadoop. If you 
consider again an imbalanced scenario: each
node that you add for proxying can become a liability: maybe the node is 
busy with other activities, maybe it goes
off-line, etc...

Note that es-hadoop does retries for both rejections (ES is busy) and 
network failures (whether ephemeral or
permanent, by switching to a different node).

As for computing the hash for each document and guessing what exact primary 
it will hit, there are certain
challenges with that:
a. the client has to replicate the logic in ES (which can change across 
versions, settings, etc..)
b. each hadoop task that writes ends up making multiple connections across 
ES primaries. While this _might_ work for
small indices with a small number of shards, for anything else this 
approach will be inefficient. You will end up
with significantly more connections that can (and will) fail.
c. the bulk itself will be divided into smaller bulks that reduce their 
efficiency especially since the load itself
it's not consistent.

es-hadoop is a client optimized for Hadoop environments. Emphasis 'client'. 
it does not and should not - try to
outsmart ES. We strive for excellent performance but without sacrificing 
reliability. Once data is inside ES, it
gets all its goodies, from replication, sharding, etc... -

If you're worried about performance, which we always try to improve, I'd be 
happy to look into using some concrete
benchmarks.

Hope this helps,

On Sun, May 4, 2014 at 8:41 PM, Ashwin Jayaprakash > wrote:

Hi, I have 2 related questions regarding routing write requests. Thanks 
in advance for answering!

*Question 1:*
I saw this line in the EsOutputFormat class and I was wondering why:

(https://github.com/elasticsearch/elasticsearch-hadoop/blob/master/mr/src/main/java/org/elasticsearch/hadoop/mr/EsOutputFormat.java#L221

)

(https://github.com/elasticsearch/elasticsearch-hadoop/blob/master/mr/src/main/java/org/elasticsearch/hadoop/rest/RestRepository.java#L239

)


|
MaptargetShards =repository.getWriteTargetPrimaryShards();
 repository.close();


ListorderedShards =newArrayList(targetShards.keySet());
// make sure the order is strict
Collections.sort(orderedShards);

// if there's no task info, just pick