date:20140723

1.  Fix the unicast discovery list to contain the two nodes you'll end up
with. Make sure the cluster name matches.
2. Start the new node.
3. Watch it join the cluster. If it doesn't then check step 1 and repeat 2
until it works.
4. Use the allocation filtering API to ban all shards from the node you
will be removing.
5. Wait until there are no shards on that node. You can check using the
_cat/shards API.
6. Shut down the node.

You might want to think about getting three nodes and configuring minimum
master nodes but I'm sure you've heard that before.

Nik
On Jul 23, 2014 10:21 AM, "Pierre-Vincent Ledoux" 
wrote:

> Hi,
>
> We have a little cluster of 2 nodes, hosting 4 indexes of about 1.5M
> documents each, replicated on both nodes.
>
> Those 2 nodes are on VPS that are stored on the same physical host. As it
> represents a single point of failure, we have decided to start a new VPS on
> a different host.
>
> What is the correct procedure to add the new node to the cluster, get the
> indexes replicated in it, and then removed one of the older node?
>
> We don't use mutlicast, so I imagine that I can add the node the the
> unicast list in the config file, but how I can be sure that it will not
> fail all my cluster when I will restart elasticsearch?
>
> Those nodes are online in production so it's a bit touchy for us to take
> any risk with it.
>
>
> Cheers,
>
> Pv
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1681d317-f0a1-4bff-9256-eabf1de2a3cc%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0L2oP0KzeVjZ_5%2BWoeAC%3D4%2Bb_FVkhdVhn_UtC6vmLMxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Resuming Interrupted Reindexing Process

2014-07-23 Thread Sarang Zargar

We are reindexing an index with following attributes:

   - Size ~3TB 
   - Ever increasing (new records are pumped at high volume during peak 
   hours)

Is there a clean way to resume an interrupted reindexing process? 
Does elastic search provides a way to persist the scrolls? If we can use 
the same scroll we can essentially resume again. 
What are the impacts of keeping scrolls alive for very long durations? 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dbb00ce5-535a-41d3-aea7-60d145a075f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ES 1.3.0 and 1.2.3 released

http://www.elasticsearch.org/blog/elasticsearch-1-3-0-released/

Today, we are happy to announce the release of *Elasticsearch 1.3.0*, based
> on* Lucene 4.9*, along with a bugfix release of *Elasticsearch 1.2.3*.
> You can download them and read the full changes list here:
>
>- Latest stable release: Elasticsearch 1.3.0
>.
>
>
>- Bug fixes for 1.2.x: Elasticsearch 1.2.3
>.
>
> Elasticsearch 1.3.0 is the latest stable release. It is chock full of new
> features, performance and stability improvements, and bugfixes.  We
> recommend upgrading, especially for users with high indexing or aggregation
> loads. The full change log is available in the Elasticsearch 1.3.0
> release notes , but we
> will highlight the most important changes below:


Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Yw4scURD90YjUNOT%2B2qCLmTwddHEP21f9XDimG5BSR3w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: When to use multiple clusters

2014-07-23 Thread Alex Kehayias

Thanks Mark! We're deploying on EC2 (always a good time). Seems like the 
mixture of different indices that have different usage profiles is leading 
to some performance issues that a dedicated cluster would be more 
appropriate for.


On Wednesday, July 23, 2014 7:04:34 PM UTC-4, Mark Walkom wrote:
>
> Depends what your hardware profiles are like, and a bunch of other things 
> related to you and your environment.
> eg If you have high end servers then it makes sense to put your heavy 
> read/write indexes into a cluster on those, then leave the rest for more 
> average machines.
>
> We have multiple clusters based on use. One for an application text based 
> search, one for application logging, one for system logging and we're going 
> to spin up another one for a new project we're starting. This might sound 
> like a waste of resources, and it probably is to a degree, but we have the 
> infrastructure for it and it makes things easier to manage.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 24 July 2014 00:34, Alex Kehayias > 
> wrote:
>
>> I have several large indices (100M docs) on the same cluster. Is there 
>> any advice of when it is appropriate to separate into multiple clusters vs 
>> one large one? Each index has a slightly different usage profile (read vs 
>> write heavy, update vs insert). How many indices would you recommend for a 
>> single cluster? Is it ok to have many large indices on the same cluster? 
>>
>> Thanks!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/24ebf4dc-f281-4574-8cbb-cb049c4fac71%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6571673c-472f-4013-9608-d511a9f66d86%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: When to use multiple clusters

Depends what your hardware profiles are like, and a bunch of other things
related to you and your environment.
eg If you have high end servers then it makes sense to put your heavy
read/write indexes into a cluster on those, then leave the rest for more
average machines.

We have multiple clusters based on use. One for an application text based
search, one for application logging, one for system logging and we're going
to spin up another one for a new project we're starting. This might sound
like a waste of resources, and it probably is to a degree, but we have the
infrastructure for it and it makes things easier to manage.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 24 July 2014 00:34, Alex Kehayias  wrote:

> I have several large indices (100M docs) on the same cluster. Is there any
> advice of when it is appropriate to separate into multiple clusters vs one
> large one? Each index has a slightly different usage profile (read vs write
> heavy, update vs insert). How many indices would you recommend for a single
> cluster? Is it ok to have many large indices on the same cluster?
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/24ebf4dc-f281-4574-8cbb-cb049c4fac71%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YqsEB2BC1zexHZOPp56%3DGJCaWcp72OjtNMYqZxqXXVxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Add / Remove nodes in cluster, good practice question

If this is production you really want an odd number of nodes to reduce
potential split brain issues.

However in your case, just add the new node to the cluster, let it
replicate across, then shutdown the node you no longer want. Any impact
will be minimal.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 24 July 2014 00:21, Pierre-Vincent Ledoux  wrote:

> Hi,
>
> We have a little cluster of 2 nodes, hosting 4 indexes of about 1.5M
> documents each, replicated on both nodes.
>
> Those 2 nodes are on VPS that are stored on the same physical host. As it
> represents a single point of failure, we have decided to start a new VPS on
> a different host.
>
> What is the correct procedure to add the new node to the cluster, get the
> indexes replicated in it, and then removed one of the older node?
>
> We don't use mutlicast, so I imagine that I can add the node the the
> unicast list in the config file, but how I can be sure that it will not
> fail all my cluster when I will restart elasticsearch?
>
> Those nodes are online in production so it's a bit touchy for us to take
> any risk with it.
>
>
> Cheers,
>
> Pv
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1681d317-f0a1-4bff-9256-eabf1de2a3cc%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ak4b-%2BUOw-1KD0s8Pyvjdv%2BrEmbYXUeXsittqqPfKv%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch JavaScript language plugin 2.3.0 released


Heya,


We are pleased to announce the release of the Elasticsearch JavaScript language 
plugin, version 2.3.0.

The JavaScript language plugin allows to have javascript as the language of 
scripts to execute..

https://github.com/elasticsearch/elasticsearch-lang-javascript/

Release Notes - elasticsearch-lang-javascript - Version 2.3.0



Update:
 * [22] - Update to Lucene 4.9.0 
(https://github.com/elasticsearch/elasticsearch-lang-javascript/issues/22)
 * [21] - Update to elasticsearch 1.3.0 
(https://github.com/elasticsearch/elasticsearch-lang-javascript/issues/21)




Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-lang-javascript project repository: 
https://github.com/elasticsearch/elasticsearch-lang-javascript/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53d03cbd.e90cb40a.47bb.6b56SMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

Re: Creating Dynamic Dashboard with Kibana

2014-07-23 Thread Andre Encarnacao

I am actually wondering the same thing. Is there a way to query ES from 
within a Kibana dashboard script in order to make decisions on how to 
layout our dashboard? I tried using the ElasticSearch-js API (copy and 
pasted the entire API into the dashboard script) but there's no way to use 
the asynchronous ES data calls from within the synchronous dashboard 
script. Any way around this?

Thanks,
Andre


On Thursday, June 26, 2014 12:27:16 PM UTC-6, Yongtao You wrote:
>
> Hi,
>
> I would like to create a "dynamic" dashboard similar to 
> ".../kibana/src/index.html#/dashboard/script/logstash.js". By dynamic I 
> mean I want to generate the dashboard based on a query result. For example, 
> I would like to query a field and find out how many unique values are there 
> in that field, and then draw a time-based line for each of those unique 
> values on my dashboard. What's the recommended way of query ES from within 
> a dashboard script? Any examples?
>
> Thanks.
> Yongtao
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/455d9812-3cab-4de5-8bc6-e5e05e31e09b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Array type limitations?

Thanks for the response, Jörg.

When I filter by follower_ids, I actually use Elasticsearch's terms lookup 
feature, so I never run into the 1024 max clause limit.

That said, because I append 5k IDs to that field at a time, you are correct 
-- appending 5k IDs to an array with millions of elements can be costly 
(>5s).

Judging by the link you shared, do you suggest I use a nested object of 
arrays to store the IDs? Something like filter_ids: [ { id: 1}, { id: 2}, { 
id: 3}, ... ]? If so, would appending 5k to a nested array containing 
millions of objects not be as costly as appending to arrays are?

On Wednesday, July 23, 2014 6:21:12 PM UTC-4, Jörg Prante wrote:
>
> If you want to use more than 1024 terms, you will hit the Lucene max 
> clause limit.
>
> Managing array is not a good idea with 10M+ items. You'd have to iterate 
> it by yourself for appending/modifying which will take a very long time 
> (and space).
>
> Maybe you find this interesting for your model design 
>
> http://de.slideshare.net/martijnvg/document-relations
>
> Jörg 
>
>
> On Wed, Jul 23, 2014 at 6:36 PM, > 
> wrote:
>
>> Hey guys.
>>
>> I'm curious to know what are the limitations of an array type field? I'm 
>> using ES to store an array of social-network follower IDs for each of my 
>> users, and this can sometimes get big (10M+ items). Is this "okay" with 
>> arrays? Or should I be using something else like a nested type? My mapping 
>> is as followers:
>>
>> "follower_ids": {
>>   "type": "string",
>>   "index_name": "follower_id",
>>   "norms": {
>> "enabled": false
>>   },
>>   "index": "no",
>>   "index_options": "docs"
>> }
>>
>> Worth mentioning that I'm also using a "terms" path filter on this array 
>> field.
>>
>> Your input/feedback is much appreciated!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/8a5c2f5b-1292-409f-9196-be8f0a00282f%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ab06e64c-0458-4e68-aa63-6e106a876bfc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

JDK version 1.7.0_65?

2014-07-23 Thread Greg Brown

The current ES docs reference Java 7 
u60: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_installation.html#_installation

Lucene seems to just say u55 or 
higher: http://lucene.apache.org/core/4_9_0/SYSTEM_REQUIREMENTS.html

Has anyone used/verified ES with OpenJDK 7u65?

The push for upgrading is coming from this security 
update: https://lists.debian.org/debian-security-announce/2014/msg00169.html

Thanks
-Greg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d4826dd2-6dc5-467e-9f49-2e349e928f87%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Array type limitations?

2014-07-23 Thread joergpra...@gmail.com

If you want to use more than 1024 terms, you will hit the Lucene max clause
limit.

Managing array is not a good idea with 10M+ items. You'd have to iterate it
by yourself for appending/modifying which will take a very long time (and
space).

Maybe you find this interesting for your model design

http://de.slideshare.net/martijnvg/document-relations

Jörg


On Wed, Jul 23, 2014 at 6:36 PM,  wrote:

> Hey guys.
>
> I'm curious to know what are the limitations of an array type field? I'm
> using ES to store an array of social-network follower IDs for each of my
> users, and this can sometimes get big (10M+ items). Is this "okay" with
> arrays? Or should I be using something else like a nested type? My mapping
> is as followers:
>
> "follower_ids": {
>   "type": "string",
>   "index_name": "follower_id",
>   "norms": {
> "enabled": false
>   },
>   "index": "no",
>   "index_options": "docs"
> }
>
> Worth mentioning that I'm also using a "terms" path filter on this array
> field.
>
> Your input/feedback is much appreciated!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8a5c2f5b-1292-409f-9196-be8f0a00282f%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFaP71emRvd1tAPEeKTMBcuKvk_wCOTFATNe%3DbMgRQERQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch Python language plugin 2.3.0 released


Heya,


We are pleased to announce the release of the Elasticsearch Python language 
plugin, version 2.3.0.

The Python language plugin allows to have python as the language of scripts to 
execute..

https://github.com/elasticsearch/elasticsearch-lang-python/

Release Notes - elasticsearch-lang-python - Version 2.3.0



Update:
 * [15] - Update to Lucene 4.9.0 
(https://github.com/elasticsearch/elasticsearch-lang-python/issues/15)
 * [13] - Update to elasticsearch 1.3.0 
(https://github.com/elasticsearch/elasticsearch-lang-python/issues/13)




Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-lang-python project repository: 
https://github.com/elasticsearch/elasticsearch-lang-python/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53d03105.e90cb40a.47bb.6939SMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

Filtr range on id field faster?

2014-07-23 Thread sada

Hi!
I'm new in ES.
I have a quick question. 
Does filtr range on _id field work faster than on another field mapped to
"integer"?



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Filtr-range-on-id-field-faster-tp4060481.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1406152104957-4060481.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch Phonetic Analysis plugin 2.3.0 released


Heya,


We are pleased to announce the release of the Elasticsearch Phonetic Analysis 
plugin, version 2.3.0.

The Phonetic Analysis plugin integrates phonetic token filter analysis with 
elasticsearch..

https://github.com/elasticsearch/elasticsearch-analysis-phonetic/

Release Notes - elasticsearch-analysis-phonetic - Version 2.3.0



Update:
 * [29] - Update to Lucene 4.9.0 
(https://github.com/elasticsearch/elasticsearch-analysis-phonetic/issues/29)
 * [26] - Update to elasticsearch 1.3.0 
(https://github.com/elasticsearch/elasticsearch-analysis-phonetic/issues/26)




Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-analysis-phonetic project repository: 
https://github.com/elasticsearch/elasticsearch-analysis-phonetic/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53d02bd4.6913b40a.4d92.68bbSMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch Smart Chinese Analysis plugin 2.3.0 released


Heya,


We are pleased to announce the release of the Elasticsearch Smart Chinese 
Analysis plugin, version 2.3.0.

Smart Chinese Analysis plugin integrates Lucene Smart Chinese analysis module 
into elasticsearch..

https://github.com/elasticsearch/elasticsearch-analysis-smartcn/

Release Notes - elasticsearch-analysis-smartcn - Version 2.3.0



Update:
 * [24] - Remove deprecated smartcn_word and smartcn_sentence 
(https://github.com/elasticsearch/elasticsearch-analysis-smartcn/issues/24)
 * [23] - Update to Lucene 4.9.0 
(https://github.com/elasticsearch/elasticsearch-analysis-smartcn/issues/23)
 * [21] - Update to elasticsearch 1.3.0 
(https://github.com/elasticsearch/elasticsearch-analysis-smartcn/issues/21)




Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-analysis-smartcn project repository: 
https://github.com/elasticsearch/elasticsearch-analysis-smartcn/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53d02ab8.c46cb40a.61ce.6a11SMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch Japanese (kuromoji) Analysis plugin 2.3.0 released


Heya,


We are pleased to announce the release of the Elasticsearch Japanese (kuromoji) 
Analysis plugin, version 2.3.0.

The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis 
module into elasticsearch..

https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/

Release Notes - elasticsearch-analysis-kuromoji - Version 2.3.0



Update:
 * [37] - Update to Lucene 4.9.0 
(https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/issues/37)
 * [35] - Upgrade to elasticsearch 1.3.0 
(https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/issues/35)




Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-analysis-kuromoji project repository: 
https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53d02992.c46cb40a.61ce.69cfSMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

Re: Can I filter with exact phrases?

2014-07-23 Thread IronMike

Brilliant, I will give it a try!

On Wednesday, July 23, 2014 5:08:31 PM UTC-4, Ivan Brusic wrote:
>
> You can wrap any query with a query filter:
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-filter.html
>
> -- 
> Ivan
>
>
> On Wed, Jul 23, 2014 at 1:52 PM, IronMike  > wrote:
>
>>
>> How can I exclude exact phrases with a filter?
>>
>> Lets say I want to search for "heaven is blue" but exclude exact phrase 
>> "nature books". I understand I can use a bool query with must & must_not, 
>> but is there a way to filter exact phrases like the example below, but 
>> instead of terms, I use match_phrase?
>>
>> "query": {
>> "query_string": {
>> "query": "heaven is blue",
>> "fields": ["content"],
>> "default_operator": "AND"
>> }
>> },
>> "filter" : {
>> "not" : {
>> "terms": { "content":  ["book","nature"] }
>> }
>> }
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/55ec0d35-27b1-447e-9136-d86695e789c0%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fdf32a51-bcd5-43bc-987a-207c2032fcc5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch Stempel (Polish) Analysis plugin 2.3.0 released


Heya,


We are pleased to announce the release of the Elasticsearch Stempel (Polish) 
Analysis plugin, version 2.3.0.

The Stempel (Polish) Analysis plugin integrates Lucene stempel (polish) 
analysis module into elasticsearch..

https://github.com/elasticsearch/elasticsearch-analysis-stempel/

Release Notes - elasticsearch-analysis-stempel - Version 2.3.0



Update:
 * [28] - Update to Lucene 4.9.0 
(https://github.com/elasticsearch/elasticsearch-analysis-stempel/issues/28)
 * [27] - Update to elasticsearch 1.3.0 
(https://github.com/elasticsearch/elasticsearch-analysis-stempel/issues/27)




Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-analysis-stempel project repository: 
https://github.com/elasticsearch/elasticsearch-analysis-stempel/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53d0284e.6913b40a.4d92.67f0SMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch ICU Analysis plugin 2.3.0 released


Heya,


We are pleased to announce the release of the Elasticsearch ICU Analysis 
plugin, version 2.3.0.

The ICU Analysis plugin integrates Lucene ICU module into elasticsearch, adding 
ICU relates analysis components..

https://github.com/elasticsearch/elasticsearch-analysis-icu/

Release Notes - elasticsearch-analysis-icu - Version 2.3.0



Update:
 * [33] - Update to Lucene 4.9.0 
(https://github.com/elasticsearch/elasticsearch-analysis-icu/issues/33)
 * [32] - Update to elasticsearch 1.3.0 
(https://github.com/elasticsearch/elasticsearch-analysis-icu/issues/32)




Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-analysis-icu project repository: 
https://github.com/elasticsearch/elasticsearch-analysis-icu/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53d02743.2627b40a.2c79.6b3dSMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sorting on a custom script field in Java

I found the error !!

enable dynamic scripting in config file

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html





On Wednesday, July 23, 2014 3:05:11 PM UTC-5, M_20 wrote:
>
> Hi Igor,
>
> I wrote a code to sort on custom script( based on ES documentation 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html
>  
> ), but it doesn't work.
> Here are the code and the error message. Any suggestion?
>
> / journalscorenormal --> the field's name
>
> String script = "doc['journalscorenormal'].
> value";
> ScriptSortBuilder sortSc = SortBuilders.scriptSort(script,"float");
> SearchResponse response = client
> .prepareSearch(index)
> .setTypes(MyVars.index.getType())
> .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
> .setQuery(
> QueryBuilders.fuzzyLikeThisQuery(field)
> .likeText(searchTerm)).setFrom(0)
> .setSize(60)
> .addSort(sortSc)
> .setExplain(true).execute().actionGet();
>
>
>
>
> Failed to parse source 
> [{"from":0,"size":60,"query":{"flt":{"fields":["sentence"],"like_text":"disease"}},"explain":true,"
> *sort":[{"_script":{"script":"doc['journalscorenormal'].value","type":"float"}}]}]]*];
>  
> nested: ScriptException[dynamic scripting disabled]; }
> at 
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:233)
> at 
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$1.onFailure(TransportSearchTypeAction.java:179)
> at 
> org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:523)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
>
>
>
>
> On Tuesday, February 26, 2013 4:30:46 PM UTC-6, echin1999 wrote:
>>
>> Hi,
>>
>>
>> I was wondering if someone could point me to some sample Java code that 
>> specifically shows how to sort on a custom script field.  
>>
>>
>> I see that the SearchRequestBuilder has a method called: 
>> addScriptField(name, 
>> script, params); 
>>
>>
>> but i'm not sure how to define the "script" argument.   I assume the 
>> "name" is the reference to the script field that i would pass into the 
>> construction of a FieldSortBuilder()? 
>>
>>
>> I want to be able to sort my results based on whether or not a property 
>> (or multiple properties) of type long[] contains a long value specified at 
>> search time (ie. documents where the criteria holds true will be returned 
>> ahead of those documents where the criteria is false).  
>>
>>
>>
>> I've attached my sample code which does basic sorting on an existing 
>> field - that seems to work fine.   I  now just want to enhance it to be 
>> able to sort on a custom script field coded as i describe above.  Any 
>> pointers are greatly appreciated!
>>
>>
>> thanks
>>
>> Ed
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/43320492-9521-4ba5-9782-4afe782069b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Can I filter with exact phrases?

2014-07-23 Thread Ivan Brusic

You can wrap any query with a query filter:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-filter.html

-- 
Ivan


On Wed, Jul 23, 2014 at 1:52 PM, IronMike  wrote:

>
> How can I exclude exact phrases with a filter?
>
> Lets say I want to search for "heaven is blue" but exclude exact phrase
> "nature books". I understand I can use a bool query with must & must_not,
> but is there a way to filter exact phrases like the example below, but
> instead of terms, I use match_phrase?
>
> "query": {
> "query_string": {
> "query": "heaven is blue",
> "fields": ["content"],
> "default_operator": "AND"
> }
> },
> "filter" : {
> "not" : {
> "terms": { "content":  ["book","nature"] }
> }
> }
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/55ec0d35-27b1-447e-9136-d86695e789c0%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAEDKJnE5XNyZ%3DwdmczGiF%3Do_kxbKuAhY6D9D5b2eNPzQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Can I filter with exact phrases?

2014-07-23 Thread IronMike


How can I exclude exact phrases with a filter?

Lets say I want to search for "heaven is blue" but exclude exact phrase 
"nature books". I understand I can use a bool query with must & must_not, 
but is there a way to filter exact phrases like the example below, but 
instead of terms, I use match_phrase?

"query": {
"query_string": {
"query": "heaven is blue",
"fields": ["content"],
"default_operator": "AND"
}
},
"filter" : {
"not" : {
"terms": { "content":  ["book","nature"] }
}
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/55ec0d35-27b1-447e-9136-d86695e789c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sorting on a custom script field in Java

Hi Igor,

I wrote a code to sort on custom script( based on ES documentation 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html
 
), but it doesn't work.
Here are the code and the error message. Any suggestion?

/ journalscorenormal --> the field's name

String script = "doc['journalscorenormal'].
value";
ScriptSortBuilder sortSc = SortBuilders.scriptSort(script,"float");
SearchResponse response = client
.prepareSearch(index)
.setTypes(MyVars.index.getType())
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(
QueryBuilders.fuzzyLikeThisQuery(field)
.likeText(searchTerm)).setFrom(0)
.setSize(60)
.addSort(sortSc)
.setExplain(true).execute().actionGet();




Failed to parse source 
[{"from":0,"size":60,"query":{"flt":{"fields":["sentence"],"like_text":"disease"}},"explain":true,"
*sort":[{"_script":{"script":"doc['journalscorenormal'].value","type":"float"}}]}]]*];
 
nested: ScriptException[dynamic scripting disabled]; }
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:233)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$1.onFailure(TransportSearchTypeAction.java:179)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:523)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)





On Tuesday, February 26, 2013 4:30:46 PM UTC-6, echin1999 wrote:
>
> Hi,
>
>
> I was wondering if someone could point me to some sample Java code that 
> specifically shows how to sort on a custom script field.  
>
>
> I see that the SearchRequestBuilder has a method called: addScriptField(name, 
> script, params); 
>
>
> but i'm not sure how to define the "script" argument.   I assume the 
> "name" is the reference to the script field that i would pass into the 
> construction of a FieldSortBuilder()? 
>
>
> I want to be able to sort my results based on whether or not a property 
> (or multiple properties) of type long[] contains a long value specified at 
> search time (ie. documents where the criteria holds true will be returned 
> ahead of those documents where the criteria is false).  
>
>
>
> I've attached my sample code which does basic sorting on an existing field 
> - that seems to work fine.   I  now just want to enhance it to be able to 
> sort on a custom script field coded as i describe above.  Any pointers are 
> greatly appreciated!
>
>
> thanks
>
> Ed
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/89ace971-b938-4181-a27c-f7a842df4a82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: long String retrieved as empty; short String retrieved fine; why ES?

2014-07-23 Thread Adrian

Correction: I get somewhere around 220 characters in that String NOT 40-50 
as I originally mentioned.


On Wednesday, 23 July 2014 15:36:10 UTC-4, Adrian wrote:
>
> I use this script to inspect data in various docs from ES.
>
> {
>   "query": {
> "match_all": {}
>   },
>   "sort": {
> "_script": {
>   "script": "if(doc['site'].values.contains(12)){return 
> 'foo'}else{return doc['dataX'].values }",
>   "type": "string",
>   "order": "desc"
> }
>   }
> }
>
> The only important part of this script is that it will always go in the 
> else clause and print  doc['dataX'].values
>
> I also have this document JSON:
> {
> "doc":{
>   "site" : "gencoupe.com",
>   "name" : "amount-active-users",
>   "daily" : {
> "dataX": "1000,490,390,600,300",
> "dataY": "1388538061, 1388624461, 1388710861, 1388797261, 133661",
> "startDate":1388538061,
> "endDate":133661
>   }
>
> }
> }
>
> I've noticed that if the string dataX is longer than 40 someting 
> characters, the script I presented retrieves dataX as an empty array. If 
> the strnig is 40 chars or less it works fine and retrieves the contents of 
> the String.
>
> Is there a limit on how many char a String can store in ES? Why am I 
> seeing this inconsistent behaviour and how can I make it retrieve the 
> entire string, of potentially very large size?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/045114fe-abfc-4144-8824-a6b312cd22cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

long String retrieved as empty; short String retrieved fine; why ES?

2014-07-23 Thread Adrian

I use this script to inspect data in various docs from ES.

{
  "query": {
"match_all": {}
  },
  "sort": {
"_script": {
  "script": "if(doc['site'].values.contains(12)){return 
'foo'}else{return doc['dataX'].values }",
  "type": "string",
  "order": "desc"
}
  }
}

The only important part of this script is that it will always go in the 
else clause and print  doc['dataX'].values

I also have this document JSON:
{
"doc":{
  "site" : "gencoupe.com",
  "name" : "amount-active-users",
  "daily" : {
"dataX": "1000,490,390,600,300",
"dataY": "1388538061, 1388624461, 1388710861, 1388797261, 133661",
"startDate":1388538061,
"endDate":133661
  }

}
}

I've noticed that if the string dataX is longer than 40 someting 
characters, the script I presented retrieves dataX as an empty array. If 
the strnig is 40 chars or less it works fine and retrieves the contents of 
the String.

Is there a limit on how many char a String can store in ES? Why am I seeing 
this inconsistent behaviour and how can I make it retrieve the entire 
string, of potentially very large size?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/39a82d40-3381-4b1f-a5b4-96805b14e90a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Get top n parents based on sum of score of its children

2014-07-23 Thread Maxime Nay

Hi,

I came up with something but I am not sure this is optimal.
Can you let me know what I could improve?

{
  "query": {
  "bool": {
  "must": [
  {"has_child" : {
  "type" : "books",
  "score_mode": "sum",
  "query" : {
  "function_score": {
  "filter": {
  "and":[
  {"range" : { "publishedAt" : { "gte" : 
"2014-07-20T22:04:51.000Z" } } }
  ]
  },
  "script_score": { "script": "doc['rating'].value" 
}
  }
  }
  }},
  {"filtered" : {
  "boost": 0,
  "query": { "match_all": { }},
  "filter": {
  "and":[
  {"range" : { "fansCount" : { "gt" : 150 } } }
  ]
  }
  }}
  ]
  }   
  }
}

This will return me the cumulated rating as score.

Thanks!

On Tuesday, July 22, 2014 4:15:56 PM UTC-7, Maxime Nay wrote:
>
> Hi,
>
> Here is my problem:
>
> We have Authors (parents) and books (children). Authors can be regularly 
> updated (let's say the number of their fans can vary with the time). Books 
> are immutable, and each book has a rating.
> I want to be able to get the parents that have the highest cumulated 
> rating score for a particular period of time. 
> For example I have:
>
> John, who had 100 fans in January and 200 fans in July, wrote:
> - "book 1" with a rating of 12 in January
> - "book 3" with a rating of 19 in March
> - "book 6" with a rating of 9 in June
>
> Jane, who had 50 fans in January and 600 in July, wrote:
> - "book 2" with a rating of 13 in February
> - "book 4" with a rating of 11 in April
> - "book 5" with a rating of 10 in May
>
> In July, I would like to get back, for the time period February - May, the 
> following results (ordered by cumulated rated):
> Jane, 600 fans - 34 cumulated rating
> John, 200 fans - 19 cumulated rating
>
>
> At the same time, I also want to be able to filter based on the content of 
> the books and also based on characteristics of the authors.
>
> I tried to use a parent/child model, and use a query with has_child and 
> function_score, but couldn't find a way to do I am looking for.
>
> Any idea?
>
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/69c83a6a-4a1f-46a7-b3cb-964f5ab283f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: SIREn plugin for nested documents

2014-07-23 Thread Brian

Thanks for the link. Unfortunately, Chrome on Mac OS (latest versions of 
each) causes this web page to blank and redisplay continually. Can't read 
it; hope you can.

In a previous life, I created a search engine that handled parent/child 
relationships with blindingly fast performance. One trick was that the 
index didn't just contain the document ID, but it contained the entire 
hierarchy of IDs. So, for example (and brevity, the IDs are single letters):

Document ID and
relationship  Fully qualified and indexed ID
---   --
A A
   B  A.B
  C   A.B.C
   D  A.D
  E   A.D.E
  F   A.D.F

So for example, it was nearly instantaneous to determine that, just by 
looking at and comparing the fully qualified IDs:

A and F are in the same parent-child hierarchy, with F being a child of D 
and a grandchild of A.

E and F are siblings under the same parent.

And so on.

Not sure how this would mesh with Lucene though. But complex parent-child 
relationships could be intersected just by the fully qualified IDs that 
came out of the inverted index. Documents did not need to be fetched or 
cached to perform this operation, and the result was breathtakingly 
blindingly fast performance.

Just FYI. I can discuss off-line if anyone wishes.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5b6ef1ce-3daf-4de5-b106-710fd306863d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: SIREn plugin for nested documents

2014-07-23 Thread joergpra...@gmail.com

I noticed Siren has an example of 1000 library catalog records from British
Library prepared in JSON

https://github.com/sindicetech/siren/blob/master/siren-elasticsearch-demo/src/example/datasets/bnb/

>From what it seems, Siren can index a tree (semi-structured data), using
positional nodes, then you can express a tree node DSL query in JSON, and
the result is something like a list of found node ids.

Regarding the "inner hits" challenge, this seems to get very close, because
a JSON doc is always semi-structured. The question is how to embed Siren
documents into Elasticsearch documents (or vice versa), i.e. can they
co-exist and queried by a single query, combining the power of both.

While this is interesting for nested hierarchical data models, I am
studying JSON-LD and graph search in ES, for being able to follow links
between docs (or even between ES docs and web resources, local or remote).

Jörg


On Wed, Jul 23, 2014 at 7:52 PM, Ivan Brusic  wrote:

> Has anyone else seen this plugin? http://siren.solutions/siren/overview/
>
> There was some discussion between one of the developers and Jorg a while
> back, so I guess this is the outcome. Have not tried it yet, but I will
> give it a shot this weekend. I am hoping that it can fix a longstanding
> issue in Elasticsearch (and my biggest roadblock):
> https://github.com/elasticsearch/elasticsearch/issues/3022
>
> Cheers,
>
> Ivan
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA3-NCDz%2B-gzAd74Pq3-kiGTvEZDW_L-uuhRG6V_-BSvg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEue3WsZ0h-Ud0y2Z7oY2gp3mo6iWv84DnygCPVibVRRw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

How do I set up ElasticSearch nodes on EC2 so they have consistent DNS entries?

2014-07-23 Thread Ryan V



I have 3 ElasticSearch nodes in a cluster on AWS EC2. My client apps use 
connection pooling and have the public IP addresses for all 3 nodes in 
their config files.

The problem I have is that EC2 seems to occasionally reassign public IP 
addresses for these instances. They also change if I stop and restart an 
instance.

My app will actually stay online since the connection pool will round robin 
the three known IP addresses, but eventually, all three will change and the 
app will stop working.

So, how should I be setting up an ElasticSearch cluster on EC2 so that my 
clients can continue to connect even if the instances change IP addresses?

   1. I could use Elastic IPs, but these are limited to 5 per account and I 
   will eventually have many more than 5 nodes (different environments, dev, 
   staging, test, etc.)
   2. I could use Elastic Load Balancers, and put one node behind each ELB, 
   but that seems like a pretty hacky solution and an improper use of load 
   balancers.
   3. I could create my own DNS entries under my own domain and update the 
   DNS table whenever I notice an IP address has changed, but that seems 
   really error prone if no one is checking the IPs every day.

Is there an option I'm missing?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/56aa0943-23d3-46c7-b5ae-210f08af73f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: IMAPRiver plugin attachemnt index issue

Hey Dave,

Thanks a lot for all your help,
Gabriel

On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:
>
> Hi all,
>
> I have installed elasticserarch 1.2.1, and IMAPRiver 
> plugin elasticsearch-river-imap-0.0.7-b20 with 
> elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.
>
> [2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man] 
> version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
> [2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man] 
> initializing ...
> [2014-07-23 11:56:44,329][INFO ][plugins  ] [Shiver Man] 
> loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
> [2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] 
> initialized
> [2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] 
> starting ...
> [2014-07-23 11:56:48,052][INFO ][transport] [Shiver Man] 
> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
> 10.125.71.146:9300]}
> [2014-07-23 11:56:51,106][INFO ][cluster.service  ] [Shiver Man] 
> new_master [Shiver 
> Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]],
>  
> reason: zen-disco-join (elected_as_master)
> [2014-07-23 11:56:51,148][INFO ][discovery] [Shiver Man] 
> elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
> [2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man] 
> bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
> 10.125.71.146:9200]}
> [2014-07-23 11:56:52,278][INFO ][gateway  ] [Shiver Man] 
> recovered [2] indices into cluster_state
> [2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man] 
> started
> [2014-07-23 11:56:54,760][INFO 
> ][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river 
> name: river-imap
> [2014-07-23 11:56:54,761][INFO 
> ][de.saly.elasticsearch.river.imap.IMAPRiver] Start IMAPRiver ...
> [2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory] 
> Using default implementation for ThreadExecutor
> [2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job 
> execution threads will use class loader of thread: elasticsearch[Shiver 
> Man][generic][T#4]
> [2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl] 
> Initialized Scheduler Signaller of type: class 
> org.quartz.core.SchedulerSignalerImpl
> [2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz 
> Scheduler v.2.2.1 created.
> [2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore 
> initialized.
> [2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler] 
> Scheduler meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' 
> with instanceId 'NON_CLUSTERED'
>
>
> Index created:
>
> curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{
>
>"type":"imap",
>"mail.store.protocol":"imap",
>"mail.imap.host":"cto-sdx01",
>"mail.imap.port":993,
>"mail.imap.ssl.enable":true,
>"mail.imap.connectionpoolsize":"3",
>"mail.debug":"false",
>"mail.imap.timeout":1,
>"user":"gkapitan",
>"password":"xxx",
>"schedule":null,
>"interval":"60s",
>"threads":5,
>"folderpattern":null,
>"bulk_size":100,
>"max_bulk_requests":"30",
>"bulk_flush_interval":"5s",
>"mail_index_name":"imapriverdata",
>"mail_type_name":"mail",
>"with_striptags_from_textcontent":true,
>"with_attachments":true,
>"with_text_content":true,
>"with_flag_sync":true,
>"index_settings" : null,
>"type_mapping" : null
>
> }'
>
> The documents are loaded but the achievements are not indexed for any of: 
> pdf, doc,docx, csv,xls...
>
> Any idea of what I might have missed?
>
> Below is a sample response:
>
>
> {"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
>   "attachmentCount" : 1,
>   "attachments" : [ {
> "content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
> "contentType" : "text/csv; charset=us-ascii",
> "size" : 33,
> "filename" : "Book1.csv",
> "name" : "Book1.csv"
>   } ],
>   "bcc" : null,
>   "cc" : null,
>   "contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
>   "flaghashcode" : 48,
>   "flags" : [ "Recent", "Seen" ],
>   "folderFullName" : "mail/gkapitan/Sent",
>   "folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
>   "from" : {
> "email" : "gkapi...@cto-sdx01.idm.symcto.com",
> "personal" : "Gabriel Kapitany"
>   },
>   "headers" : [ {
> "name" : "Content-Disposition",
> "value" : "inline"
>   }, {
> "name" : "Subject",
> "value" : "csv"
>   }, {
> "name" : "To",
> "value" : "gkapi...@cto-sdx01.idm.symcto.com"
>   }, {
> "name" : "Date",
> "

SIREn plugin for nested documents

2014-07-23 Thread Ivan Brusic

Has anyone else seen this plugin? http://siren.solutions/siren/overview/

There was some discussion between one of the developers and Jorg a while
back, so I guess this is the outcome. Have not tried it yet, but I will
give it a shot this weekend. I am hoping that it can fix a longstanding
issue in Elasticsearch (and my biggest roadblock):
https://github.com/elasticsearch/elasticsearch/issues/3022

Cheers,

Ivan

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA3-NCDz%2B-gzAd74Pq3-kiGTvEZDW_L-uuhRG6V_-BSvg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: AWS t2.medium vs. Linode 4GB

2014-07-23 Thread arshpreet singh

On Wed, Jul 23, 2014 at 11:07 PM, Aivis Silins  wrote:
> Hello,
>
> Today I did performance tests (some kind of) between Linode 4GB and AWS
> t2.medium - just to see which one is faster in this situation.
> Test results are quite similar between these two - which is a bit
> interesting (or probably not - let me know what do you think), because AWS
> t2.medium are 2 vCPU, but Linode 4GB are 4vCPU and SSD.

Can you mention the clock Speed as I know T2's is upto 3.3 GHz


-- 

Thanks
Arshpreet singh
http://arshpreetsingh.wordpress.com/
I have no special talents. Only passionately curious

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAstK2EW1GwyFiW3kiA5m1_ATuMj7drZW7ZeJr0Hy7CQ5LyPmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: one index has different _type and different _type have same field with different type will disturb？

2014-07-23 Thread xu piao

my es log is :

org.elasticsearch.transport.RemoteTransportException: [Yuriko 
Oyama][inet[/10.0.8.102:19300]][search/phase/query]
Caused by: org.elasticsearch.search.query.QueryPhaseExecutionException: 
[matrix][1]: query[filtered(name:"北京 西路"^20.0 tags:"北京 西路"^5.0 intra:"北京 
西路" region:"北京 西路"^2.0 address:"北京 
西路"^2.0)->cache(_type:group)],from[0],size[20],sort[]:
 
Query Failed [Failed to execute main query]
at 
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:127)
at 
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:257)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:612)
at 
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.IllegalStateException: field "region" was indexed 
without position data; cannot run PhraseQuery (term=北京)
at 
org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:278)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:317)
at 
org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:526)
at 
org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:136)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
at 
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:581)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:533)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:510)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:345)
at 
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:116) 



I have another question : one index is like a database or table ?


在 2014年7月24日星期四UTC+8上午1时41分23秒，Ivan Brusic写道：
>
> In the end, all the documents end up in the same Lucene index, and while 
> Lucene is schema-less, all similarly named fields must be the same type. 
> Types are useful in Elasticsearch to separate different type 
> configurations, but will fail on similarly named fields.
>
> There is some work being done to remove the ambiguity, but for now it is 
> best to avoid reusing field names across types in the same index.
>
> I am assuming this issue is what is causing the problems in your other 
> thread.
>
> Cheers,
>
> Ivan
>
>
> On Wed, Jul 23, 2014 at 2:53 AM, xu piao > 
> wrote:
>
>> i have one index  with different _type like  
>> http://localhost:9200/matrix/group and http://localhost:9200/matrix/user
>>
>> the two _type all have 'region' field but with different type.
>> group :
>>
>>- region: {
>>   - type: string
>>   - store: true
>>   - analyzer: ik
>>}
>>
>> }
>>
>> user: 
>>
>>- region: {
>>   - type: "long"
>>}
>>
>>  
>> when 'matrix' index only have 'group' type. 
>> i do search use 'query_string' with multi_field is ok
>>
>> {
>>   "bool" : {
>> "must" : {
>>   "query_string" : {
>> "query" : "北京西城",
>> "fields" : [ "name^20", "tags^5", "intra^1", "region^2", 
>> "address^2" ]
>>   }
>> },
>> "minimum_should_match" : "1"
>>   }
>> }
>>
>> but when i index 'user' type .do zhe same  query it will throw exception
>>
>> org.elasticsearch.action.search.SearchPhaseExecutionException: Failed to 
>> execute phase [query], all shards failed; shardFailures 
>> {[_vgvj_11QpKOuZNo90nD_A][matrix][0]: RemoteTransportException[[Blizzard 
>> II][inet[/10.0.8.235:19300]][search/phase/query]]; nested: 
>> QueryPhaseExecutionException[[matrix][0]: query[filtered((region:"北京 
>> 西路"))->cache(_type:group)],from[0],size[20]: Query Failed [Failed to 
>> execute main query]]; nested: IllegalStateException[field "region" was 
>> indexed without position data; cannot run PhraseQuery (term=北京)]; 
>> }{[_vgvj_11QpKOuZNo90nD_A][matrix][4]: RemoteTransportException[[Blizzard 
>> II][inet[/10.0.8.235:19300]][search/phase/query]]; nested: 
>> QueryPhaseExecutionException[[matrix][4]: query[filtered((region:"北京 
>> 西路"))->cache(_type:group)],from[0],size[20]: Query Failed [Failed to 
>> execute main query]]; nested: IllegalStateException[field "region" was 
>> indexed without position data; cannot run PhraseQuery (term=北京)]; 
>> }{[_vgvj_11QpKOuZNo90nD_A][matrix][3]: RemoteTransportException[[Blizzard 
>> II][inet[/10.0

Re: IMAPRiver plugin attachemnt index issue

That's what I thought.
attachments does not have attachment type.

          "attachments" : {
            "properties" : {
              "content" : {
                "type" : "string"
              },
              "contentType" : {
                "type" : "string"
              },
              "filename" : {
                "type" : "string"
              },
              "name" : {
                "type" : "string"
              },
              "size" : {
                "type" : "long"
              }
            }
          },

You need to fix that. I think you need to provide your own mapping.
Sadly it's not documented on the river project. May be you should open an issue 
and a PR to document how to deal with attachments?



-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 23 juillet 2014 à 19:27:43, Gabriel Kapitany (gkapit...@gmail.com) a écrit:

Changing the query from _all to attachments doesn't change the result and the 
second query returns:

{
  "imapriverdata" : {
    "mappings" : {
      "imapriverstate" : {
        "properties" : {
          "errormsg" : {
            "type" : "string"
          },
          "exists" : {
            "type" : "boolean"
          },
          "folderUrl" : {
            "type" : "string"
          },
          "lastCount" : {
            "type" : "long"
          },
          "lastIndexed" : {
            "type" : "long"
          },
          "lastSchedule" : {
            "type" : "long"
          },
          "lastTook" : {
            "type" : "long"
          },
          "lastUid" : {
            "type" : "long"
          },
          "messageid" : {
            "type" : "string"
          },
          "uidValidity" : {
            "type" : "long"
          }
        }
      },
      "mail" : {
        "properties" : {
          "attachmentCount" : {
            "type" : "long"
          },
          "attachments" : {
            "properties" : {
              "content" : {
                "type" : "string"
              },
              "contentType" : {
                "type" : "string"
              },
              "filename" : {
                "type" : "string"
              },
              "name" : {
                "type" : "string"
              },
              "size" : {
                "type" : "long"
              }
            }
          },
          "contentType" : {
            "type" : "string"
          },
          "flaghashcode" : {
            "type" : "integer"
          },
          "flags" : {
            "type" : "string"
          },
          "folderFullName" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "folderUri" : {
            "type" : "string"
          },
          "from" : {
            "properties" : {
              "email" : {
                "type" : "string"
              },
              "personal" : {
                "type" : "string"
              }
            }
          },
          "headers" : {
            "properties" : {
              "name" : {
                "type" : "string"
              },
              "value" : {
                "type" : "string"
              }
            }
          },
          "mailboxType" : {
            "type" : "string"
          },
          "receivedDate" : {
            "type" : "date",
            "format" : "basic_date_time"
          },
          "sentDate" : {
            "type" : "date",
            "format" : "basic_date_time"
          },
          "size" : {
            "type" : "long"
          },
          "subject" : {
            "type" : "string"
          },
          "textContent" : {
            "type" : "string"
          },
          "to" : {
            "properties" : {
              "email" : {
                "type" : "string"
              },
              "personal" : {
                "type" : "string"
              }
            }
          },
          "uid" : {
            "type" : "long"
          }
        }
      }
    }
  }
}


On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:
Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver plugin 
elasticsearch-river-imap-0.0.7-b20 with 
elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node                     ] [Shiver Man] 
version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node                     ] [Shiver Man] 
initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins                  ] [Shiver Man] loaded 
[mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node                     ] [Shiver Man] 
initialized
[2014-07-23 11:56:47,850][INFO ][node                     ] [Shiver Man] 
starting ...
[2014-07-23 11:56:48,052][INFO ][transport                ] [Shiver Man] 
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address 
{inet[/10.125.71.14

Re: Sudden high "OS Load", then ES VM disappears

2014-07-23 Thread joergpra...@gmail.com

How do you estimate the memory to configure?

Here is my rough estimation:

Docker limit = 10G
Kernel, OS services etc. ~ 1G
OS filesystem cache for ES ~50% of 10G ~ 5G
ES JVM + direct buffer + heap = 10G - 1G  - 5G ~ 4G

So when you estimate for ES JVM + direct buffers ~1G, you have left 3G
(maybe 4G) for ES_HEAP_SIZE. So I recommend to start with 3G and increase
slowly, but no more than 5G.

With mlockall=true, you force Linux to abort processes that are not mlocked
(see OOM killer, it may be responsible for rendering the Docker LXC
unusable) so it is not a good idea to start with mlockall=true, as long as
the overall memory allocation in the Docker LXC is only vague. If you find
a save configuration where everything is balanced out, then you can enable
mlockall=true.

Jörg



On Wed, Jul 23, 2014 at 6:56 PM, Daniel Schonfeld 
wrote:

> Here's TOP and df -h... that's as best as I can get for now from inside
> that container.
>
> https://gist.github.com/danielschonfeld/d75c43cce34a16a57926
>
>
> On Wednesday, July 23, 2014 12:49:32 PM UTC-4, Daniel Schonfeld wrote:
>>
>> I wanted to add another five cents worth to what Michael already
>> described.
>>
>> Before, when this happened, we used to run the docker container without a
>> memory limit and giving ES a HEAP_SIZE of 10GB (10240M to be exact).  When
>> it happened the first time, doing `free -m` revealed that the system was
>> left with barely a few hundreds of megabytes left.  Second time around we
>> figured we'd tweak it a bit.  Give the docker container 10G memory limit
>> and limit the ES_HEAP_SIZE to 7.5GB (7168M).  This time when it crashed and
>> is still hung as we speak (been trying to get a thread dump, but that's not
>> happening... jstack can't even connect to that process its so deeply hung)
>> doing 'top' revealed that the memory usage is at 9.9GB for that process.
>>  Basically the system seems bogged out of memory.
>>
>> Question is... how did it exceed the heap size given?  And more to the
>> point, just how much memory should we be allocating? If this is just a
>> problem of shortage of memory, spawning up a new machine with a bunch more
>> RAM is no problem.  But we just wish we weren't shooting in the dark.
>>  We've already tried increasing the SSD size to get more IOPS (GCE ties
>> size to IOPS bandwidth).
>>
>> Last but not least, is there something we should be doing to tweak the
>> lucene segment size?
>>
>> Thanks for all your thoughts!
>>
>> On Wednesday, July 23, 2014 12:15:51 PM UTC-4, mic...@modernmast.com
>> wrote:
>>>
>>> No, the VM does not response to curl requests. Closest thing I found to
>>> that read bytes in the API was the _cluster/stats endpoint -->
>>> https://gist.github.com/schonfeld/d45401e44f5961c38502
>>>
>>> Were you referring to a different endpoint?
>>>
>>> What're your thoughts re "angry hardware"? Insufficient resources? Are
>>> there any known issues with CoreOS + ES?
>>>
>>> On Wednesday, July 23, 2014 11:12:10 AM UTC-4, Nikolas Everett wrote:




 On Wed, Jul 23, 2014 at 10:19 AM,  wrote:

> Looking at the JVM GC graphs, I do see some increases there, but not
> sure those are enough to cause this storm?
>
>
> 
>

 That looks like it wasn't the problem.

>
> 
> The disk graphs in Marvel don't show anything out of the ordinary. I'm
> not sure how to check on those write_bytes and read_bytes... Where does ES
> report those? I'm using Google Compute Engine, and according to their
> minimal graphs, while there was a small spike in the disk I/Os, it wasn't
> anything insane.
>

 Elasticsearch reports them in the same API as the number of reads.  I
 imagine they are right next to each other in marvel but I'm not sure.

>
> Right after the spike happened, the indexing rate spiked to 6.6k /
> second. Notice that the first tag notes the VM that left the cluster, the
> second tag shows the cluster went back to "green". Considering this
> happened after the node "left", does this give us any clues as for the
> reason?
>
>
> 
>
> A few observations:
> * The es process is still running on the "dead machine". I can see it
> when I ssh into the VM (thru ps aux, and docker)
> * The VM doesn't show anything in the error logs, etc
> * Running a "ps aux" on that VM actually freezes (after showing the es
> process)
>

 That seems pretty nasty.  Does the node respond to stuff like `curl
 localhost:9200`?  Sounds like the hardware/docker is just angry.

Re: one index has different _type and different _type have same field with different type will disturb？

2014-07-23 Thread Ivan Brusic

In the end, all the documents end up in the same Lucene index, and while
Lucene is schema-less, all similarly named fields must be the same type.
Types are useful in Elasticsearch to separate different type
configurations, but will fail on similarly named fields.

There is some work being done to remove the ambiguity, but for now it is
best to avoid reusing field names across types in the same index.

I am assuming this issue is what is causing the problems in your other
thread.

Cheers,

Ivan


On Wed, Jul 23, 2014 at 2:53 AM, xu piao  wrote:

> i have one index  with different _type like
> http://localhost:9200/matrix/group and http://localhost:9200/matrix/user
>
> the two _type all have 'region' field but with different type.
> group :
>
>- region: {
>   - type: string
>   - store: true
>   - analyzer: ik
>}
>
> }
>
> user:
>
>- region: {
>   - type: "long"
>}
>
>
> when 'matrix' index only have 'group' type.
> i do search use 'query_string' with multi_field is ok
>
> {
>   "bool" : {
> "must" : {
>   "query_string" : {
> "query" : "北京西城",
> "fields" : [ "name^20", "tags^5", "intra^1", "region^2",
> "address^2" ]
>   }
> },
> "minimum_should_match" : "1"
>   }
> }
>
> but when i index 'user' type .do zhe same  query it will throw exception
>
> org.elasticsearch.action.search.SearchPhaseExecutionException: Failed to
> execute phase [query], all shards failed; shardFailures
> {[_vgvj_11QpKOuZNo90nD_A][matrix][0]: RemoteTransportException[[Blizzard
> II][inet[/10.0.8.235:19300]][search/phase/query]]; nested:
> QueryPhaseExecutionException[[matrix][0]: query[filtered((region:"北京
> 西路"))->cache(_type:group)],from[0],size[20]: Query Failed [Failed to
> execute main query]]; nested: IllegalStateException[field "region" was
> indexed without position data; cannot run PhraseQuery (term=北京)];
> }{[_vgvj_11QpKOuZNo90nD_A][matrix][4]: RemoteTransportException[[Blizzard
> II][inet[/10.0.8.235:19300]][search/phase/query]]; nested:
> QueryPhaseExecutionException[[matrix][4]: query[filtered((region:"北京
> 西路"))->cache(_type:group)],from[0],size[20]: Query Failed [Failed to
> execute main query]]; nested: IllegalStateException[field "region" was
> indexed without position data; cannot run PhraseQuery (term=北京)];
> }{[_vgvj_11QpKOuZNo90nD_A][matrix][3]: RemoteTransportException[[Blizzard
> II][inet[/10.0.8.235:19300]][search/phase/query]]; nested:
> QueryPhaseExecutionException[[matrix][3]: query[filtered((region:"北京
> 西路"))->cache(_type:group)],from[0],size[20]: Query Failed [Failed to
> execute main query]]; nested: IllegalStateException[field "region" was
> indexed without position data; cannot run PhraseQuery (term=北京)];
> }{[3XK6HZ_WSbG4E7Tot3mMMw][matrix][2]:
> QueryPhaseExecutionException[[matrix][2]: query[filtered((region:"北京
> 西路"))->cache(_type:group)],from[0],size[20]: Query Failed [Failed to
> execute main query]]; nested: IllegalStateException[field "region" was
> indexed without position data; cannot run PhraseQuery (term=北京)];
> }{[3XK6HZ_WSbG4E7Tot3mMMw][matrix][1]:
> QueryPhaseExecutionException[[matrix][1]: query[filtered((region:"北京
> 西路"))->cache(_type:group)],from[0],size[20]: Query Failed [Failed to
> execute main query]]; nested: IllegalStateException[field "region" was
> indexed without position data; cannot run PhraseQuery (term=北京)]; }
> at
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:276)
> at
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:224)
> at
> org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:205)
> at
> org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
> at
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
> at
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:296)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>
> all my search request has assign type in java like :
>
> SearchRequestBuilder searchRequest =
> esClient.prepareSearch("matrix").setTypes("group")
>
> .setSearchType(SearchType.QUERY_THEN_FETCH).setFrom(form).setSize(count).setExplain(true);
>
> why ??
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view thi

AWS t2.medium vs. Linode 4GB

2014-07-23 Thread Aivis Silins

Hello,

Today I did performance tests (some kind of) between Linode 4GB and AWS
t2.medium - just to see which one is faster in this situation.
Test results are quite similar between these two - which is a bit
interesting (or probably not - let me know what do you think), because AWS
t2.medium are 2 vCPU, but Linode 4GB are 4vCPU and SSD.

Any additional information you can find in spreadsheet.

Test methodology - each request/query was executed in sync order 100 times
(so 1 requests isn't cached - all others I assume is)
Data was indexed in AWS machine and after that synced (with rsync) to
Linode (so data set is completely same and both instances Elasticsearch
config is completely same).
- Java version same (Open-jdk 7)
- Linux dist. same (Ubuntu 14.04 64bit)
- 1 index/1 shard
- Custom tokenization process (which generates a lot of tokens when
indexing happens - that's why there is 8.4GB in total from 3.2GB of RAW
data)

As measurement I take "took" property from Elasticsearch response.

This wasn't stress test or something like that - each query was executed
100 times in sync order - just to if there is difference.
Cluster (single node) - wasn't on stress or something like this - just
these queries...

Spreadsheet
-
https://docs.google.com/spreadsheets/d/1Zoxtww3Vo1EFTzwVp08eHK7i8lHriszPNH31EiUZcAc/edit?usp=sharing

So, the question is - Why AWS results are almost same as Linode? I would
expect that Linode is faster than AWS t2.medium.

Thank you!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8758afcd-232f-43e2-a3ef-638df27d2fbe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: IMAPRiver plugin attachemnt index issue

Changing the query from *_all* to* attachments* doesn't change the result 
and the second query returns:

{
  "imapriverdata" : {
"mappings" : {
  "imapriverstate" : {
"properties" : {
  "errormsg" : {
"type" : "string"
  },
  "exists" : {
"type" : "boolean"
  },
  "folderUrl" : {
"type" : "string"
  },
  "lastCount" : {
"type" : "long"
  },
  "lastIndexed" : {
"type" : "long"
  },
  "lastSchedule" : {
"type" : "long"
  },
  "lastTook" : {
"type" : "long"
  },
  "lastUid" : {
"type" : "long"
  },
  "messageid" : {
"type" : "string"
  },
  "uidValidity" : {
"type" : "long"
  }
}
  },
  "mail" : {
"properties" : {
  "attachmentCount" : {
"type" : "long"
  },
  "attachments" : {
"properties" : {
  "content" : {
"type" : "string"
  },
  "contentType" : {
"type" : "string"
  },
  "filename" : {
"type" : "string"
  },
  "name" : {
"type" : "string"
  },
  "size" : {
"type" : "long"
  }
}
  },
  "contentType" : {
"type" : "string"
  },
  "flaghashcode" : {
"type" : "integer"
  },
  "flags" : {
"type" : "string"
  },
  "folderFullName" : {
"type" : "string",
"index" : "not_analyzed"
  },
  "folderUri" : {
"type" : "string"
  },
  "from" : {
"properties" : {
  "email" : {
"type" : "string"
  },
  "personal" : {
"type" : "string"
  }
}
  },
  "headers" : {
"properties" : {
  "name" : {
"type" : "string"
  },
  "value" : {
"type" : "string"
  }
}
  },
  "mailboxType" : {
"type" : "string"
  },
  "receivedDate" : {
"type" : "date",
"format" : "basic_date_time"
  },
  "sentDate" : {
"type" : "date",
"format" : "basic_date_time"
  },
  "size" : {
"type" : "long"
  },
  "subject" : {
"type" : "string"
  },
  "textContent" : {
"type" : "string"
  },
  "to" : {
"properties" : {
  "email" : {
"type" : "string"
  },
  "personal" : {
"type" : "string"
  }
}
  },
  "uid" : {
"type" : "long"
  }
}
  }
}
  }
}


On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:
>
> Hi all,
>
> I have installed elasticserarch 1.2.1, and IMAPRiver 
> plugin elasticsearch-river-imap-0.0.7-b20 with 
> elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.
>
> [2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man] 
> version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
> [2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man] 
> initializing ...
> [2014-07-23 11:56:44,329][INFO ][plugins  ] [Shiver Man] 
> loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
> [2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] 
> initialized
> [2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] 
> starting ...
> [2014-07-23 11:56:48,052][INFO ][transport] [Shiver Man] 
> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
> 10.125.71.146:9300]}
> [2014-07-23 11:56:51,106][INFO ][cluster.service  ] [Shiver Man] 
> new_master [Shiver 
> Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]],
>  
> reason: zen-disco-join (elected_as_master)
> [2014-07-23 11:56:51,148][INFO ][discovery] [Shiver Man] 
> elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
> [2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man] 
> bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
> 10.125.71.146:9200]}
> [2014-07-23 11:56:52,278][INFO ][gateway  ] [Shiver Man] 
> recovered [2] indices into cluster_state
> [2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man] 
> started
> [2014-07-23 11:56:54,760][INFO 
> ][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river 
> name: river-imap
> [2014-07-23 1

Re: Sudden high "OS Load", then ES VM disappears

Heap size isn't total memory size.  Its size for java to allocate stuff.
There are tons of other memory costs but the rule of thumb is to set heap
to no more then 30GB and around half of physical memory.  I imagine docker
is complicating things.

I'm not sure what docker does with memory mapped files, for instance.
Elasticsearch uses tons of them so it uses tons of virtual memory.  But
that shouldn't be a problem because its virtual.  Though I don't know what
docker thinks of that.


On Wed, Jul 23, 2014 at 12:56 PM, Daniel Schonfeld 
wrote:

> Here's TOP and df -h... that's as best as I can get for now from inside
> that container.
>
> https://gist.github.com/danielschonfeld/d75c43cce34a16a57926
>
>
> On Wednesday, July 23, 2014 12:49:32 PM UTC-4, Daniel Schonfeld wrote:
>>
>> I wanted to add another five cents worth to what Michael already
>> described.
>>
>> Before, when this happened, we used to run the docker container without a
>> memory limit and giving ES a HEAP_SIZE of 10GB (10240M to be exact).  When
>> it happened the first time, doing `free -m` revealed that the system was
>> left with barely a few hundreds of megabytes left.  Second time around we
>> figured we'd tweak it a bit.  Give the docker container 10G memory limit
>> and limit the ES_HEAP_SIZE to 7.5GB (7168M).  This time when it crashed and
>> is still hung as we speak (been trying to get a thread dump, but that's not
>> happening... jstack can't even connect to that process its so deeply hung)
>> doing 'top' revealed that the memory usage is at 9.9GB for that process.
>>  Basically the system seems bogged out of memory.
>>
>> Question is... how did it exceed the heap size given?  And more to the
>> point, just how much memory should we be allocating? If this is just a
>> problem of shortage of memory, spawning up a new machine with a bunch more
>> RAM is no problem.  But we just wish we weren't shooting in the dark.
>>  We've already tried increasing the SSD size to get more IOPS (GCE ties
>> size to IOPS bandwidth).
>>
>> Last but not least, is there something we should be doing to tweak the
>> lucene segment size?
>>
>> Thanks for all your thoughts!
>>
>> On Wednesday, July 23, 2014 12:15:51 PM UTC-4, mic...@modernmast.com
>> wrote:
>>>
>>> No, the VM does not response to curl requests. Closest thing I found to
>>> that read bytes in the API was the _cluster/stats endpoint -->
>>> https://gist.github.com/schonfeld/d45401e44f5961c38502
>>>
>>> Were you referring to a different endpoint?
>>>
>>> What're your thoughts re "angry hardware"? Insufficient resources? Are
>>> there any known issues with CoreOS + ES?
>>>
>>> On Wednesday, July 23, 2014 11:12:10 AM UTC-4, Nikolas Everett wrote:




 On Wed, Jul 23, 2014 at 10:19 AM,  wrote:

> Looking at the JVM GC graphs, I do see some increases there, but not
> sure those are enough to cause this storm?
>
>
> 
>

 That looks like it wasn't the problem.

>
> 
> The disk graphs in Marvel don't show anything out of the ordinary. I'm
> not sure how to check on those write_bytes and read_bytes... Where does ES
> report those? I'm using Google Compute Engine, and according to their
> minimal graphs, while there was a small spike in the disk I/Os, it wasn't
> anything insane.
>

 Elasticsearch reports them in the same API as the number of reads.  I
 imagine they are right next to each other in marvel but I'm not sure.

>
> Right after the spike happened, the indexing rate spiked to 6.6k /
> second. Notice that the first tag notes the VM that left the cluster, the
> second tag shows the cluster went back to "green". Considering this
> happened after the node "left", does this give us any clues as for the
> reason?
>
>
> 
>
> A few observations:
> * The es process is still running on the "dead machine". I can see it
> when I ssh into the VM (thru ps aux, and docker)
> * The VM doesn't show anything in the error logs, etc
> * Running a "ps aux" on that VM actually freezes (after showing the es
> process)
>

 That seems pretty nasty.  Does the node respond to stuff like `curl
 localhost:9200`?  Sounds like the hardware/docker is just angry.



 Nik


>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web vis

Re: Help with Synonyms

2014-07-23 Thread Daniel Yim

Ivan, thank you feeding my curiosity! The first one really gave me an 
"a-ha!" moment when I saw the images of the synonym matching as directed 
graphs. It put some insight as to why my multi-token synonyms were being 
expanded a certain way.

On Tuesday, July 22, 2014 4:37:45 PM UTC-5, Ivan Brusic wrote:
>
> I appreciate the fact that you want to know why you shouldn't use synonyms 
> at query time. I couldn't find the following articles during my last 
> response (I read them a while back and I have way too many bookmarks), 
> but I finally found them:
>
>
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
>
> -- 
> Ivan
>
>
> On Tue, Jul 22, 2014 at 11:03 AM, Ivan Brusic  > wrote:
>
>> A couple of reasons. The biggest issue is multi word synonyms since the 
>> query parser will tokenize the query before analysis is applied. Also, 
>> scoring could be affected and the results can be screwy. Here is a better 
>> write up:
>>
>>
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>
>> -- 
>> Ivan
>>
>>
>> On Tue, Jul 22, 2014 at 10:47 AM, Daniel Yim > > wrote:
>>
>>> Thank you! That solved the initial issue.
>>>
>>> Could you expand on why I would need two analyzers? I did what you 
>>> asked, but I am unsure of the reason behind it and would like to learn.
>>>
>>> Here are my updated settings:
>>>
>>> curl -XPUT "http://localhost:9200/personsearch"; -d'
>>> {
>>>   "settings": {
>>> "index": {
>>>   "analysis": {
>>> "analyzer": {
>>>   "XYZSynAnalyzer": {
>>> "tokenizer": "whitespace",
>>> "filter": [
>>>   "lowercase",
>>>   "XYZSynFilter"
>>> ]
>>>   },
>>>   "MyAnalyzer": {
>>> "tokenizer": "standard",
>>> "filter": [
>>>   "standard",
>>>   "lowercase",
>>>   "stop"
>>> ]
>>>   }
>>> },
>>> "filter": {
>>>   "XYZSynFilter": {
>>> "type": "synonym",
>>> "synonyms": [
>>>   "aids, retrovirology"
>>> ]
>>>   }
>>> }
>>>   }
>>> }
>>>   },
>>>   "mappings": {
>>> "xyzemployee": {
>>>   "_all": {
>>> "analyzer": "XYZSynAnalyzer"
>>>   },
>>>   "properties": {
>>> "firstName": {
>>>   "type": "string"
>>> },
>>> "lastName": {
>>>   "type": "string"
>>> },
>>> "middleName": {
>>>   "type": "string",
>>>   "include_in_all": false,
>>>   "index": "not_analyzed"
>>> },
>>> "specialty": {
>>>   "type": "string",
>>>   "index_analyzer": "XYZSynAnalyzer",
>>>   "search_analyzer": "MyAnalyzer"
>>> }
>>>   }
>>> }
>>>   }
>>> }'
>>>
>>> On Tuesday, July 22, 2014 11:56:40 AM UTC-5, Ivan Brusic wrote:
>>>
 Your issue is casing. You are only applying the synonym filter, which 
 by default does not lowercase terms. You can either set ignore_case to 
 true 
 for the synonym filter or apply a lower case filter before the synonym. I 
 prefer to use the latter approach since I prefer to have all my analyzed 
 tokens lowercased.

 Also, you should only apply the synonym filter at index time. You would 
 need to create two similar analyzers, one with the synonym filter and one 
 without. You can set the different ones via index_analyzer and 
 search_analyzer.

 http://www.elasticsearch.org/guide/en/elasticsearch/
 reference/current/mapping-core-types.html#string

 Cheers,

 Ivan


 On Tue, Jul 22, 2014 at 9:33 AM, Daniel Yim  wrote:

> Hi everyone,
>
> I am relatively new to elasticsearch and am having issues with getting 
> my synonym filter to work. Can you take a look at the settings and tell 
> me 
> where I am going wrong?
>
> I am expecting the search for "aids" to match the search results if I 
> were to search for "retrovirology", but this is not happening.
>
> Thanks!
>
>  curl -XDELETE "http://localhost:9200/personsearch";
>
> curl -XPUT "http://localhost:9200/personsearch"; -d'
> {
>   "settings": {
> "index": {
>   "analysis": {
> "analyzer": {
>   "XYZSynAnalyzer": {
> "tokenizer": "standard",
> "filter": [
>   "XYZSynFilter"
> ]
>   }
> },
> "filter": {
>   "XYZSynFilter": {
> "type": "synonym",
> "synonyms": [
>   "aids, retrovirology"
> ]
>   }
> }
>   }
> }
>   },
>   "mappings": {
> "xyzemployee": {
>   "_all": {
>

Re: IMAPRiver plugin attachemnt index issue

Could you try:

curl -XGET 'http://localhost:9200/imapriverdata/_search' -d '{
  "query" : {
        "match" : { "attachments" : "alpha" }
    }
}
'

Also what gives:

GET /imapriverdata/_mapping?pretty

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 23 juillet 2014 à 19:12:06, Gabriel Kapitany (gkapit...@gmail.com) a écrit:

query:

curl -XGET 'http://localhost:9200/imapriverdata/_search' -d '{

  "query" : {
        "match" : { "_all" : "alpha" }
    }
}
'

response:

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

Thanks,
Gabriel

On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:
Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver plugin 
elasticsearch-river-imap-0.0.7-b20 with 
elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node                     ] [Shiver Man] 
version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node                     ] [Shiver Man] 
initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins                  ] [Shiver Man] loaded 
[mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node                     ] [Shiver Man] 
initialized
[2014-07-23 11:56:47,850][INFO ][node                     ] [Shiver Man] 
starting ...
[2014-07-23 11:56:48,052][INFO ][transport                ] [Shiver Man] 
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address 
{inet[/10.125.71.146:9300]}
[2014-07-23 11:56:51,106][INFO ][cluster.service          ] [Shiver Man] 
new_master [Shiver 
Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]],
 reason: zen-disco-join (elected_as_master)
[2014-07-23 11:56:51,148][INFO ][discovery                ] [Shiver Man] 
elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
[2014-07-23 11:56:51,178][INFO ][http                     ] [Shiver Man] 
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address 
{inet[/10.125.71.146:9200]}
[2014-07-23 11:56:52,278][INFO ][gateway                  ] [Shiver Man] 
recovered [2] indices into cluster_state
[2014-07-23 11:56:52,279][INFO ][node                     ] [Shiver Man] started
[2014-07-23 11:56:54,760][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] 
IMAPRiver created, river name: river-imap
[2014-07-23 11:56:54,761][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] 
Start IMAPRiver ...
[2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory] Using 
default implementation for ThreadExecutor
[2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job 
execution threads will use class loader of thread: elasticsearch[Shiver 
Man][generic][T#4]
[2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl] 
Initialized Scheduler Signaller of type: class 
org.quartz.core.SchedulerSignalerImpl
[2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz 
Scheduler v.2.2.1 created.
[2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore 
initialized.
[2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler] Scheduler 
meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' with instanceId 
'NON_CLUSTERED'


Index created:

curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{

   "type":"imap",
   "mail.store.protocol":"imap",
   "mail.imap.host":"cto-sdx01",
   "mail.imap.port":993,
   "mail.imap.ssl.enable":true,
   "mail.imap.connectionpoolsize":"3",
   "mail.debug":"false",
   "mail.imap.timeout":1,
   "user":"gkapitan",
   "password":"xxx",
   "schedule":null,
   "interval":"60s",
   "threads":5,
   "folderpattern":null,
   "bulk_size":100,
   "max_bulk_requests":"30",
   "bulk_flush_interval":"5s",
   "mail_index_name":"imapriverdata",
   "mail_type_name":"mail",
   "with_striptags_from_textcontent":true,
   "with_attachments":true,
   "with_text_content":true,
   "with_flag_sync":true,
   "index_settings" : null,
   "type_mapping" : null

}'

The documents are loaded but the achievements are not indexed for any of: pdf, 
doc,docx, csv,xls...

Any idea of what I might have missed?

Below is a sample response:

{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
  "attachmentCount" : 1,
  "attachments" : [ {
    "content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
    "contentType" : "text/csv; charset=us-ascii",
    "size" : 33,
    "filename" : "Book1.csv",
    "name" : "Book1.csv"
  } ],
  "bcc" : null,
  "cc" : null,
  "contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
  "flaghashcode" : 48,
  "flags" : [ "Recent", "Seen" ],
  "folderFullName" : "mail/gkapitan/Sent",
  "folderUri" : "imap://gkapitan@cto-sdx01/ma

Re: IMAPRiver plugin attachemnt index issue

query:

curl -XGET 'http://localhost:9200/imapriverdata/_search' -d '{

  "query" : {
"match" : { "_all" : "alpha" }
}
}
'

response:

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

Thanks,
Gabriel

On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:
>
> Hi all,
>
> I have installed elasticserarch 1.2.1, and IMAPRiver 
> plugin elasticsearch-river-imap-0.0.7-b20 with 
> elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.
>
> [2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man] 
> version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
> [2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man] 
> initializing ...
> [2014-07-23 11:56:44,329][INFO ][plugins  ] [Shiver Man] 
> loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
> [2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] 
> initialized
> [2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] 
> starting ...
> [2014-07-23 11:56:48,052][INFO ][transport] [Shiver Man] 
> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
> 10.125.71.146:9300]}
> [2014-07-23 11:56:51,106][INFO ][cluster.service  ] [Shiver Man] 
> new_master [Shiver 
> Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]],
>  
> reason: zen-disco-join (elected_as_master)
> [2014-07-23 11:56:51,148][INFO ][discovery] [Shiver Man] 
> elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
> [2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man] 
> bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
> 10.125.71.146:9200]}
> [2014-07-23 11:56:52,278][INFO ][gateway  ] [Shiver Man] 
> recovered [2] indices into cluster_state
> [2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man] 
> started
> [2014-07-23 11:56:54,760][INFO 
> ][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river 
> name: river-imap
> [2014-07-23 11:56:54,761][INFO 
> ][de.saly.elasticsearch.river.imap.IMAPRiver] Start IMAPRiver ...
> [2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory] 
> Using default implementation for ThreadExecutor
> [2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job 
> execution threads will use class loader of thread: elasticsearch[Shiver 
> Man][generic][T#4]
> [2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl] 
> Initialized Scheduler Signaller of type: class 
> org.quartz.core.SchedulerSignalerImpl
> [2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz 
> Scheduler v.2.2.1 created.
> [2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore 
> initialized.
> [2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler] 
> Scheduler meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' 
> with instanceId 'NON_CLUSTERED'
>
>
> Index created:
>
> curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{
>
>"type":"imap",
>"mail.store.protocol":"imap",
>"mail.imap.host":"cto-sdx01",
>"mail.imap.port":993,
>"mail.imap.ssl.enable":true,
>"mail.imap.connectionpoolsize":"3",
>"mail.debug":"false",
>"mail.imap.timeout":1,
>"user":"gkapitan",
>"password":"xxx",
>"schedule":null,
>"interval":"60s",
>"threads":5,
>"folderpattern":null,
>"bulk_size":100,
>"max_bulk_requests":"30",
>"bulk_flush_interval":"5s",
>"mail_index_name":"imapriverdata",
>"mail_type_name":"mail",
>"with_striptags_from_textcontent":true,
>"with_attachments":true,
>"with_text_content":true,
>"with_flag_sync":true,
>"index_settings" : null,
>"type_mapping" : null
>
> }'
>
> The documents are loaded but the achievements are not indexed for any of: 
> pdf, doc,docx, csv,xls...
>
> Any idea of what I might have missed?
>
> Below is a sample response:
>
>
> {"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
>   "attachmentCount" : 1,
>   "attachments" : [ {
> "content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
> "contentType" : "text/csv; charset=us-ascii",
> "size" : 33,
> "filename" : "Book1.csv",
> "name" : "Book1.csv"
>   } ],
>   "bcc" : null,
>   "cc" : null,
>   "contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
>   "flaghashcode" : 48,
>   "flags" : [ "Recent", "Seen" ],
>   "folderFullName" : "mail/gkapitan/Sent",
>   "folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
>   "from" : {
> "email" : "gkapi...@cto-sdx01.idm.symcto.com",
> "personal" : "Gabriel Kapitany"
>   },
>   "headers" : [ {
>

Re: IMAPRiver plugin attachemnt index issue

What kind of query are you running?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 23 juillet 2014 à 19:02:31, Gabriel Kapitany (gkapit...@gmail.com) a écrit:

Hi David,

I have checked if the attachment type matches the attachment type: 
"contentType" : "text/csv; charset=us-ascii" and it does. 

A search by a keyword in the the attachment brings up nothing.  However if I do 
a mime64 decode on the attachment (content filed) shows that the attachment is 
correctly stored.

Thanks,
Gabriel

On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:
Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver plugin 
elasticsearch-river-imap-0.0.7-b20 with 
elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node                     ] [Shiver Man] 
version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node                     ] [Shiver Man] 
initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins                  ] [Shiver Man] loaded 
[mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node                     ] [Shiver Man] 
initialized
[2014-07-23 11:56:47,850][INFO ][node                     ] [Shiver Man] 
starting ...
[2014-07-23 11:56:48,052][INFO ][transport                ] [Shiver Man] 
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address 
{inet[/10.125.71.146:9300]}
[2014-07-23 11:56:51,106][INFO ][cluster.service          ] [Shiver Man] 
new_master [Shiver 
Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]],
 reason: zen-disco-join (elected_as_master)
[2014-07-23 11:56:51,148][INFO ][discovery                ] [Shiver Man] 
elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
[2014-07-23 11:56:51,178][INFO ][http                     ] [Shiver Man] 
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address 
{inet[/10.125.71.146:9200]}
[2014-07-23 11:56:52,278][INFO ][gateway                  ] [Shiver Man] 
recovered [2] indices into cluster_state
[2014-07-23 11:56:52,279][INFO ][node                     ] [Shiver Man] started
[2014-07-23 11:56:54,760][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] 
IMAPRiver created, river name: river-imap
[2014-07-23 11:56:54,761][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] 
Start IMAPRiver ...
[2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory] Using 
default implementation for ThreadExecutor
[2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job 
execution threads will use class loader of thread: elasticsearch[Shiver 
Man][generic][T#4]
[2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl] 
Initialized Scheduler Signaller of type: class 
org.quartz.core.SchedulerSignalerImpl
[2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz 
Scheduler v.2.2.1 created.
[2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore 
initialized.
[2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler] Scheduler 
meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' with instanceId 
'NON_CLUSTERED'


Index created:

curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{

   "type":"imap",
   "mail.store.protocol":"imap",
   "mail.imap.host":"cto-sdx01",
   "mail.imap.port":993,
   "mail.imap.ssl.enable":true,
   "mail.imap.connectionpoolsize":"3",
   "mail.debug":"false",
   "mail.imap.timeout":1,
   "user":"gkapitan",
   "password":"xxx",
   "schedule":null,
   "interval":"60s",
   "threads":5,
   "folderpattern":null,
   "bulk_size":100,
   "max_bulk_requests":"30",
   "bulk_flush_interval":"5s",
   "mail_index_name":"imapriverdata",
   "mail_type_name":"mail",
   "with_striptags_from_textcontent":true,
   "with_attachments":true,
   "with_text_content":true,
   "with_flag_sync":true,
   "index_settings" : null,
   "type_mapping" : null

}'

The documents are loaded but the achievements are not indexed for any of: pdf, 
doc,docx, csv,xls...

Any idea of what I might have missed?

Below is a sample response:

{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
  "attachmentCount" : 1,
  "attachments" : [ {
    "content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
    "contentType" : "text/csv; charset=us-ascii",
    "size" : 33,
    "filename" : "Book1.csv",
    "name" : "Book1.csv"
  } ],
  "bcc" : null,
  "cc" : null,
  "contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
  "flaghashcode" : 48,
  "flags" : [ "Recent", "Seen" ],
  "folderFullName" : "mail/gkapitan/Sent",
  "folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
  "from" : {
    "email" : "gkapi...@cto-sdx01.idm.symcto.com",
    "personal" : "Gabriel Kapitany"

Re: IMAPRiver plugin attachemnt index issue

Hi David,

I have checked if the attachment type matches the attachment type: 
"contentType" : "text/csv; charset=us-ascii" and it does. 

A search by a keyword in the the attachment brings up nothing.  However if 
I do a mime64 decode on the attachment (content filed) shows that the 
attachment is correctly stored.

Thanks,
Gabriel

On Wednesday, July 23, 2014 12:31:31 PM UTC-4, Gabriel Kapitany wrote:
>
> Hi all,
>
> I have installed elasticserarch 1.2.1, and IMAPRiver 
> plugin elasticsearch-river-imap-0.0.7-b20 with 
> elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.
>
> [2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man] 
> version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
> [2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man] 
> initializing ...
> [2014-07-23 11:56:44,329][INFO ][plugins  ] [Shiver Man] 
> loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
> [2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] 
> initialized
> [2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] 
> starting ...
> [2014-07-23 11:56:48,052][INFO ][transport] [Shiver Man] 
> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
> 10.125.71.146:9300]}
> [2014-07-23 11:56:51,106][INFO ][cluster.service  ] [Shiver Man] 
> new_master [Shiver 
> Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]],
>  
> reason: zen-disco-join (elected_as_master)
> [2014-07-23 11:56:51,148][INFO ][discovery] [Shiver Man] 
> elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
> [2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man] 
> bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
> 10.125.71.146:9200]}
> [2014-07-23 11:56:52,278][INFO ][gateway  ] [Shiver Man] 
> recovered [2] indices into cluster_state
> [2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man] 
> started
> [2014-07-23 11:56:54,760][INFO 
> ][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river 
> name: river-imap
> [2014-07-23 11:56:54,761][INFO 
> ][de.saly.elasticsearch.river.imap.IMAPRiver] Start IMAPRiver ...
> [2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory] 
> Using default implementation for ThreadExecutor
> [2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job 
> execution threads will use class loader of thread: elasticsearch[Shiver 
> Man][generic][T#4]
> [2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl] 
> Initialized Scheduler Signaller of type: class 
> org.quartz.core.SchedulerSignalerImpl
> [2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz 
> Scheduler v.2.2.1 created.
> [2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore 
> initialized.
> [2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler] 
> Scheduler meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' 
> with instanceId 'NON_CLUSTERED'
>
>
> Index created:
>
> curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{
>
>"type":"imap",
>"mail.store.protocol":"imap",
>"mail.imap.host":"cto-sdx01",
>"mail.imap.port":993,
>"mail.imap.ssl.enable":true,
>"mail.imap.connectionpoolsize":"3",
>"mail.debug":"false",
>"mail.imap.timeout":1,
>"user":"gkapitan",
>"password":"xxx",
>"schedule":null,
>"interval":"60s",
>"threads":5,
>"folderpattern":null,
>"bulk_size":100,
>"max_bulk_requests":"30",
>"bulk_flush_interval":"5s",
>"mail_index_name":"imapriverdata",
>"mail_type_name":"mail",
>"with_striptags_from_textcontent":true,
>"with_attachments":true,
>"with_text_content":true,
>"with_flag_sync":true,
>"index_settings" : null,
>"type_mapping" : null
>
> }'
>
> The documents are loaded but the achievements are not indexed for any of: 
> pdf, doc,docx, csv,xls...
>
> Any idea of what I might have missed?
>
> Below is a sample response:
>
>
> {"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
>   "attachmentCount" : 1,
>   "attachments" : [ {
> "content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
> "contentType" : "text/csv; charset=us-ascii",
> "size" : 33,
> "filename" : "Book1.csv",
> "name" : "Book1.csv"
>   } ],
>   "bcc" : null,
>   "cc" : null,
>   "contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
>   "flaghashcode" : 48,
>   "flags" : [ "Recent", "Seen" ],
>   "folderFullName" : "mail/gkapitan/Sent",
>   "folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
>   "from" : {
> "email" : "gkapi...@cto-sdx01.idm.symcto.com",
> "personal" : "Ga

Re: Sudden high "OS Load", then ES VM disappears

2014-07-23 Thread Daniel Schonfeld

Here's TOP and df -h... that's as best as I can get for now from inside 
that container.

https://gist.github.com/danielschonfeld/d75c43cce34a16a57926

On Wednesday, July 23, 2014 12:49:32 PM UTC-4, Daniel Schonfeld wrote:
>
> I wanted to add another five cents worth to what Michael already described.
>
> Before, when this happened, we used to run the docker container without a 
> memory limit and giving ES a HEAP_SIZE of 10GB (10240M to be exact).  When 
> it happened the first time, doing `free -m` revealed that the system was 
> left with barely a few hundreds of megabytes left.  Second time around we 
> figured we'd tweak it a bit.  Give the docker container 10G memory limit 
> and limit the ES_HEAP_SIZE to 7.5GB (7168M).  This time when it crashed and 
> is still hung as we speak (been trying to get a thread dump, but that's not 
> happening... jstack can't even connect to that process its so deeply hung) 
> doing 'top' revealed that the memory usage is at 9.9GB for that process. 
>  Basically the system seems bogged out of memory.
>
> Question is... how did it exceed the heap size given?  And more to the 
> point, just how much memory should we be allocating? If this is just a 
> problem of shortage of memory, spawning up a new machine with a bunch more 
> RAM is no problem.  But we just wish we weren't shooting in the dark. 
>  We've already tried increasing the SSD size to get more IOPS (GCE ties 
> size to IOPS bandwidth).
>
> Last but not least, is there something we should be doing to tweak the 
> lucene segment size?
>
> Thanks for all your thoughts!
>
> On Wednesday, July 23, 2014 12:15:51 PM UTC-4, mic...@modernmast.com 
> wrote:
>>
>> No, the VM does not response to curl requests. Closest thing I found to 
>> that read bytes in the API was the _cluster/stats endpoint --> 
>> https://gist.github.com/schonfeld/d45401e44f5961c38502
>>
>> Were you referring to a different endpoint?
>>
>> What're your thoughts re "angry hardware"? Insufficient resources? Are 
>> there any known issues with CoreOS + ES?
>>
>> On Wednesday, July 23, 2014 11:12:10 AM UTC-4, Nikolas Everett wrote:
>>>
>>>
>>>
>>>
>>> On Wed, Jul 23, 2014 at 10:19 AM,  wrote:
>>>
 Looking at the JVM GC graphs, I do see some increases there, but not 
 sure those are enough to cause this storm?


 

>>>
>>> That looks like it wasn't the problem. 
>>>
  
 
 The disk graphs in Marvel don't show anything out of the ordinary. I'm 
 not sure how to check on those write_bytes and read_bytes... Where does ES 
 report those? I'm using Google Compute Engine, and according to their 
 minimal graphs, while there was a small spike in the disk I/Os, it wasn't 
 anything insane.

>>>
>>> Elasticsearch reports them in the same API as the number of reads.  I 
>>> imagine they are right next to each other in marvel but I'm not sure. 
>>>

 Right after the spike happened, the indexing rate spiked to 6.6k / 
 second. Notice that the first tag notes the VM that left the cluster, the 
 second tag shows the cluster went back to "green". Considering this 
 happened after the node "left", does this give us any clues as for the 
 reason?


 

 A few observations:
 * The es process is still running on the "dead machine". I can see it 
 when I ssh into the VM (thru ps aux, and docker)
 * The VM doesn't show anything in the error logs, etc
 * Running a "ps aux" on that VM actually freezes (after showing the es 
 process)

>>>
>>> That seems pretty nasty.  Does the node respond to stuff like `curl 
>>> localhost:9200`?  Sounds like the hardware/docker is just angry.
>>>
>>>
>>>
>>> Nik
>>>  
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/11474c65-4443-4851-b109-75f092b49d04%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sudden high "OS Load", then ES VM disappears

2014-07-23 Thread Daniel Schonfeld

I wanted to add another five cents worth to what Michael already described.

Before, when this happened, we used to run the docker container without a 
memory limit and giving ES a HEAP_SIZE of 10GB (10240M to be exact).  When 
it happened the first time, doing `free -m` revealed that the system was 
left with barely a few hundreds of megabytes left.  Second time around we 
figured we'd tweak it a bit.  Give the docker container 10G memory limit 
and limit the ES_HEAP_SIZE to 7.5GB (7168M).  This time when it crashed and 
is still hung as we speak (been trying to get a thread dump, but that's not 
happening... jstack can't even connect to that process its so deeply hung) 
doing 'top' revealed that the memory usage is at 9.9GB for that process. 
 Basically the system seems bogged out of memory.

Question is... how did it exceed the heap size given?  And more to the 
point, just how much memory should we be allocating? If this is just a 
problem of shortage of memory, spawning up a new machine with a bunch more 
RAM is no problem.  But we just wish we weren't shooting in the dark. 
 We've already tried increasing the SSD size to get more IOPS (GCE ties 
size to IOPS bandwidth).

Last but not least, is there something we should be doing to tweak the 
lucene segment size?

Thanks for all your thoughts!

On Wednesday, July 23, 2014 12:15:51 PM UTC-4, mic...@modernmast.com wrote:
>
> No, the VM does not response to curl requests. Closest thing I found to 
> that read bytes in the API was the _cluster/stats endpoint --> 
> https://gist.github.com/schonfeld/d45401e44f5961c38502
>
> Were you referring to a different endpoint?
>
> What're your thoughts re "angry hardware"? Insufficient resources? Are 
> there any known issues with CoreOS + ES?
>
> On Wednesday, July 23, 2014 11:12:10 AM UTC-4, Nikolas Everett wrote:
>>
>>
>>
>>
>> On Wed, Jul 23, 2014 at 10:19 AM,  wrote:
>>
>>> Looking at the JVM GC graphs, I do see some increases there, but not 
>>> sure those are enough to cause this storm?
>>>
>>>
>>> 
>>>
>>
>> That looks like it wasn't the problem. 
>>
>>>  
>>> 
>>> The disk graphs in Marvel don't show anything out of the ordinary. I'm 
>>> not sure how to check on those write_bytes and read_bytes... Where does ES 
>>> report those? I'm using Google Compute Engine, and according to their 
>>> minimal graphs, while there was a small spike in the disk I/Os, it wasn't 
>>> anything insane.
>>>
>>
>> Elasticsearch reports them in the same API as the number of reads.  I 
>> imagine they are right next to each other in marvel but I'm not sure. 
>>
>>>
>>> Right after the spike happened, the indexing rate spiked to 6.6k / 
>>> second. Notice that the first tag notes the VM that left the cluster, the 
>>> second tag shows the cluster went back to "green". Considering this 
>>> happened after the node "left", does this give us any clues as for the 
>>> reason?
>>>
>>>
>>> 
>>>
>>> A few observations:
>>> * The es process is still running on the "dead machine". I can see it 
>>> when I ssh into the VM (thru ps aux, and docker)
>>> * The VM doesn't show anything in the error logs, etc
>>> * Running a "ps aux" on that VM actually freezes (after showing the es 
>>> process)
>>>
>>
>> That seems pretty nasty.  Does the node respond to stuff like `curl 
>> localhost:9200`?  Sounds like the hardware/docker is just angry.
>>
>>
>>
>> Nik
>>  
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/918f6972-60be-4bff-81e6-939f4840cde0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sortting based on combination of ES score and two fields

I wrote a code ( based on ES documentation 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html
 
), but it doesn't work.
Here are the code and the error message. Any suggestion?

/ journalscorenormal --> the field's name

String script = "doc['journalscorenormal'].value";
ScriptSortBuilder sortSc = SortBuilders.scriptSort(script,"float");
SearchResponse response = client
.prepareSearch(index)
.setTypes(MyVars.index.getType())
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(
QueryBuilders.fuzzyLikeThisQuery(field)
.likeText(searchTerm)).setFrom(0)
.setSize(60)
.addSort(sortSc)
.setExplain(true).execute().actionGet();




Failed to parse source 
[{"from":0,"size":60,"query":{"flt":{"fields":["sentence"],"like_text":"disease"}},"explain":true,"
*sort":[{"_script":{"script":"doc['journalscorenormal'].value","type":"float"}}]}]]*];
 
nested: ScriptException[dynamic scripting disabled]; }
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:233)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$1.onFailure(TransportSearchTypeAction.java:179)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:523)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)




On Wednesday, July 23, 2014 9:35:41 AM UTC-5, M_20 wrote:
>
> It seems this is what I need
>
> "script_score" : {
> "script" : "_score + doc['my_numeric_field_1'].value + 
> doc['my_numeric_field_2'].value"
> }
>
> Am I right??
>
>
>
> On Wednesday, July 23, 2014 9:29:22 AM UTC-5, M_20 wrote:
>>
>> Hi, 
>>
>> How can I sort the result based on summation of several fields?
>> For my application, I want to sort the results based on ( ElasticSearch 
>> score + field_1 + field_2 )
>> In fact:
>> Final Score = ElasticSearch score + field_1 + field_2
>> It seems kinda recursive, but I was wondering ES is able to do so.
>>
>> Thanks
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/33a3db5a-04b6-4ae6-a60f-e015e0d17ccd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Array type limitations?

Hey guys.

I'm curious to know what are the limitations of an array type field? I'm 
using ES to store an array of social-network follower IDs for each of my 
users, and this can sometimes get big (10M+ items). Is this "okay" with 
arrays? Or should I be using something else like a nested type? My mapping 
is as followers:

"follower_ids": {
  "type": "string",
  "index_name": "follower_id",
  "norms": {
"enabled": false
  },
  "index": "no",
  "index_options": "docs"
}

Worth mentioning that I'm also using a "terms" path filter on this array 
field.

Your input/feedback is much appreciated!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8a5c2f5b-1292-409f-9196-be8f0a00282f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: IMAPRiver plugin attachemnt index issue

How do you know that it has not been indexed?

You can't rely on the _source field. It sounds like attachment are here in 
attachments field.
It looks good to me.

May be you should check that the mapping is correct and attachments field has 
attachment type?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 23 juillet 2014 à 18:31:35, Gabriel Kapitany (gkapit...@gmail.com) a écrit:

Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver plugin 
elasticsearch-river-imap-0.0.7-b20 with 
elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node                     ] [Shiver Man] 
version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node                     ] [Shiver Man] 
initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins                  ] [Shiver Man] loaded 
[mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node                     ] [Shiver Man] 
initialized
[2014-07-23 11:56:47,850][INFO ][node                     ] [Shiver Man] 
starting ...
[2014-07-23 11:56:48,052][INFO ][transport                ] [Shiver Man] 
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address 
{inet[/10.125.71.146:9300]}
[2014-07-23 11:56:51,106][INFO ][cluster.service          ] [Shiver Man] 
new_master [Shiver 
Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]],
 reason: zen-disco-join (elected_as_master)
[2014-07-23 11:56:51,148][INFO ][discovery                ] [Shiver Man] 
elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
[2014-07-23 11:56:51,178][INFO ][http                     ] [Shiver Man] 
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address 
{inet[/10.125.71.146:9200]}
[2014-07-23 11:56:52,278][INFO ][gateway                  ] [Shiver Man] 
recovered [2] indices into cluster_state
[2014-07-23 11:56:52,279][INFO ][node                     ] [Shiver Man] started
[2014-07-23 11:56:54,760][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] 
IMAPRiver created, river name: river-imap
[2014-07-23 11:56:54,761][INFO ][de.saly.elasticsearch.river.imap.IMAPRiver] 
Start IMAPRiver ...
[2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory] Using 
default implementation for ThreadExecutor
[2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job 
execution threads will use class loader of thread: elasticsearch[Shiver 
Man][generic][T#4]
[2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl] 
Initialized Scheduler Signaller of type: class 
org.quartz.core.SchedulerSignalerImpl
[2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz 
Scheduler v.2.2.1 created.
[2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore 
initialized.
[2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler] Scheduler 
meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' with instanceId 
'NON_CLUSTERED'


Index created:

curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{

   "type":"imap",
   "mail.store.protocol":"imap",
   "mail.imap.host":"cto-sdx01",
   "mail.imap.port":993,
   "mail.imap.ssl.enable":true,
   "mail.imap.connectionpoolsize":"3",
   "mail.debug":"false",
   "mail.imap.timeout":1,
   "user":"gkapitan",
   "password":"xxx",
   "schedule":null,
   "interval":"60s",
   "threads":5,
   "folderpattern":null,
   "bulk_size":100,
   "max_bulk_requests":"30",
   "bulk_flush_interval":"5s",
   "mail_index_name":"imapriverdata",
   "mail_type_name":"mail",
   "with_striptags_from_textcontent":true,
   "with_attachments":true,
   "with_text_content":true,
   "with_flag_sync":true,
   "index_settings" : null,
   "type_mapping" : null

}'

The documents are loaded but the achievements are not indexed for any of: pdf, 
doc,docx, csv,xls...

Any idea of what I might have missed?

Below is a sample response:

{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
  "attachmentCount" : 1,
  "attachments" : [ {
    "content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
    "contentType" : "text/csv; charset=us-ascii",
    "size" : 33,
    "filename" : "Book1.csv",
    "name" : "Book1.csv"
  } ],
  "bcc" : null,
  "cc" : null,
  "contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
  "flaghashcode" : 48,
  "flags" : [ "Recent", "Seen" ],
  "folderFullName" : "mail/gkapitan/Sent",
  "folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
  "from" : {
    "email" : "gkapi...@cto-sdx01.idm.symcto.com",
    "personal" : "Gabriel Kapitany"
  },
  "headers" : [ {
    "name" : "Content-Disposition",
    "value" : "inline"
  }, {
    "name" : "Subject",
    "value" : "csv"
  }, {
    "name" : "To",
    "value" : "gkapi...@

IMAPRiver plugin attachemnt index issue

Hi all,

I have installed elasticserarch 1.2.1, and IMAPRiver 
plugin elasticsearch-river-imap-0.0.7-b20 with 
elasticsearch-mapper-attachments-2.2.0-SNAPSHOT.

[2014-07-23 11:56:44,304][INFO ][node ] [Shiver Man] 
version[1.2.1], pid[28748], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-23 11:56:44,305][INFO ][node ] [Shiver Man] 
initializing ...
[2014-07-23 11:56:44,329][INFO ][plugins  ] [Shiver Man] 
loaded [mapper-attachments, river-imap-0.0.7-b20-${build], sites []
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] 
initialized
[2014-07-23 11:56:47,850][INFO ][node ] [Shiver Man] 
starting ...
[2014-07-23 11:56:48,052][INFO ][transport] [Shiver Man] 
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address 
{inet[/10.125.71.146:9300]}
[2014-07-23 11:56:51,106][INFO ][cluster.service  ] [Shiver Man] 
new_master [Shiver 
Man][6vc5FbGdSlSxHjI6s7nYvw][cto-sdx02.idm.symcto.com][inet[/10.125.71.146:9300]],
 
reason: zen-disco-join (elected_as_master)
[2014-07-23 11:56:51,148][INFO ][discovery] [Shiver Man] 
elasticsearch/6vc5FbGdSlSxHjI6s7nYvw
[2014-07-23 11:56:51,178][INFO ][http ] [Shiver Man] 
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address 
{inet[/10.125.71.146:9200]}
[2014-07-23 11:56:52,278][INFO ][gateway  ] [Shiver Man] 
recovered [2] indices into cluster_state
[2014-07-23 11:56:52,279][INFO ][node ] [Shiver Man] 
started
[2014-07-23 11:56:54,760][INFO 
][de.saly.elasticsearch.river.imap.IMAPRiver] IMAPRiver created, river 
name: river-imap
[2014-07-23 11:56:54,761][INFO 
][de.saly.elasticsearch.river.imap.IMAPRiver] Start IMAPRiver ...
[2014-07-23 11:56:55,025][INFO ][org.quartz.impl.StdSchedulerFactory] Using 
default implementation for ThreadExecutor
[2014-07-23 11:56:55,029][INFO ][org.quartz.simpl.SimpleThreadPool] Job 
execution threads will use class loader of thread: elasticsearch[Shiver 
Man][generic][T#4]
[2014-07-23 11:56:55,051][INFO ][org.quartz.core.SchedulerSignalerImpl] 
Initialized Scheduler Signaller of type: class 
org.quartz.core.SchedulerSignalerImpl
[2014-07-23 11:56:55,052][INFO ][org.quartz.core.QuartzScheduler] Quartz 
Scheduler v.2.2.1 created.
[2014-07-23 11:56:55,054][INFO ][org.quartz.simpl.RAMJobStore] RAMJobStore 
initialized.
[2014-07-23 11:56:55,055][INFO ][org.quartz.core.QuartzScheduler] Scheduler 
meta-data: Quartz Scheduler (v2.2.1) 'DefaultQuartzScheduler' with 
instanceId 'NON_CLUSTERED'


Index created:

curl -XPUT 'http://localhost:9200/_river/river-imap/_meta' -d '{

   "type":"imap",
   "mail.store.protocol":"imap",
   "mail.imap.host":"cto-sdx01",
   "mail.imap.port":993,
   "mail.imap.ssl.enable":true,
   "mail.imap.connectionpoolsize":"3",
   "mail.debug":"false",
   "mail.imap.timeout":1,
   "user":"gkapitan",
   "password":"xxx",
   "schedule":null,
   "interval":"60s",
   "threads":5,
   "folderpattern":null,
   "bulk_size":100,
   "max_bulk_requests":"30",
   "bulk_flush_interval":"5s",
   "mail_index_name":"imapriverdata",
   "mail_type_name":"mail",
   "with_striptags_from_textcontent":true,
   "with_attachments":true,
   "with_text_content":true,
   "with_flag_sync":true,
   "index_settings" : null,
   "type_mapping" : null

}'

The documents are loaded but the achievements are not indexed for any of: 
pdf, doc,docx, csv,xls...

Any idea of what I might have missed?

Below is a sample response:

{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.35493678,"hits":[{"_index":"imapriverdata","_type":"mail","_id":"16::imap://gkapitan@cto-sdx01/mail/gkapitan/Sent","_score":0.35493678,"_source":{
  "attachmentCount" : 1,
  "attachments" : [ {
"content" : "YWxwaGEsYmV0YSxnYW1tYQ0KVGV0YSxzaWdtYSxwaQ0K",
"contentType" : "text/csv; charset=us-ascii",
"size" : 33,
"filename" : "Book1.csv",
"name" : "Book1.csv"
  } ],
  "bcc" : null,
  "cc" : null,
  "contentType" : "multipart/mixed; boundary=wac7ysb48OaltWcw",
  "flaghashcode" : 48,
  "flags" : [ "Recent", "Seen" ],
  "folderFullName" : "mail/gkapitan/Sent",
  "folderUri" : "imap://gkapitan@cto-sdx01/mail/gkapitan/Sent",
  "from" : {
"email" : "gkapi...@cto-sdx01.idm.symcto.com",
"personal" : "Gabriel Kapitany"
  },
  "headers" : [ {
"name" : "Content-Disposition",
"value" : "inline"
  }, {
"name" : "Subject",
"value" : "csv"
  }, {
"name" : "To",
"value" : "gkapi...@cto-sdx01.idm.symcto.com"
  }, {
"name" : "Date",
"value" : "Wed, 23 Jul 2014 12:24:05 -0400"
  }, {
"name" : "MIME-Version",
"value" : "1.0"
  }, {
"name" : "Message-ID",
"value" : "<20140723162358.ga3...@cto-sdx01.idm.symcto.com>"
  }, {
"name" : "User-Agent",
"value" : "Mutt/1.5.20 (2009-12-10)"
  }, {
"name" : "Content-Type",
"value" : "multipart/mixed; boundary=\"wac7ysb48OaltWcw

Understanding AND evaluation

2014-07-23 Thread Neil Avery

All,
After searching the docs and scouring the web Im hoping someone can help me 
understand the evaluation of 'and' filters. 

Conventionally, the short-circuit rules applyi.e. fail on first clause 
and go no further..

Can anyone confirm is this is the case?

I'd like to run the following however my native script is always executed 
against data the fails the first clause. i.e. its hitting data where the 
_host field is .local.

Regards,
Neil.


{
  "query" : {
"filtered" : {
  "query" : {
"match_all" : { }
  },
  "filter" : {
"and" : {
  "filters" : [ {
"term" : {
  "_host" : ".local"
}
  }, {
"script" : {
  "script" : "ls-script",
  "params" : {
"startMs" : 140613144,
"endMs" : 140613331
  },
  "lang" : "native"
}
  } ]
}
  }
}
  }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/454e92b6-ff8f-4f99-b452-87ccd676337f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: shingle filter for sub phrase matching

2014-07-23 Thread Nick Tackes



#create a test index with shingle mapping
curl -XPUT localhost:9200/test -d '{
   "settings":{
  "index":{
 "analysis":{
"analyzer":{
   "analyzer_shingle":{
  "tokenizer":"standard",
  "filter":["standard", "lowercase", "filter_stop", 
"filter_shingle"]
   }
},
"filter":{
   "filter_shingle":{
  "type":"shingle",
  "max_shingle_size":5,
  "min_shingle_size":2,
  "output_unigrams":"true"
   },
   "filter_stop":{
  "type":"stop",
  "stopwords":[
  "a", "an", "and", "are", "as", "at", "be", 
"but", "by",
  "for", "if", "in", "into", "is", "it",
  "no", "not", "of", "on", "or", "such",
  "that", "the", "their", "then", "there", 
"these",
  "they", "this", "to", "will", "with"
  ]
   }
}
 }
  }
   },
   "mappings":{
  "product":{
 "properties":{
"title":{
   "search_analyzer":"analyzer_shingle",
   "index_analyzer":"analyzer_shingle",
   "type":"string"
}
 }
  }
   }
}'
 
#Add some docs to the index
curl -XPOST localhost:9200/test/product/1 -d '{"title" : "EGFR"}'
curl -XPOST localhost:9200/test/product/1 -d '{"title" : "WAS"}'
curl -XPOST localhost:9200/test/product/2 -d '{"title" : "Lung Cancer"}'
curl -XPOST localhost:9200/test/product/3 -d '{"title" : "Lung"}'
curl -XPOST localhost:9200/test/product/3 -d '{"title" : "Cancer"}'
 
curl -XPOST localhost:9200/test/_refresh
 
#Analyze API to check out shingling
curl -XGET 'localhost:9200/test/_analyze?analyzer=analyzer_shingle&pretty' -d 
'EGFR and WAS Lung Cancer' | grep token
 
#Sample search should return should return EGFR, Lung Cancer, Lung, Cancer
curl -XGET 'localhost:9200/test/product/_search?q=title:EGFR+Lung+Cancer&pretty'
 
#Sample search with stop word should return EGFR, WAS, Lung Cancer, Lung, Cancer
curl -XGET 
'localhost:9200/test/product/_search?q=title:EGFR+and+WAS+Lung+Cancer&pretty'
 
#Sample search with seperating word should return EGFR, Lung Cancer, Lung, 
Cancer
curl -XGET 
'localhost:9200/test/product/_search?q=title:EGFR+and+Lung+related+Cancer&pretty'
 
#Sample search with seperating word should return EGFR, Lung Cancer, Lung, 
Cancer
curl -XGET localhost:9200/test/product/_search?pretty -d '{
"query" : {
"match" : {
"title" : {
"query" : "EGFR and Lung related Cancer",
"analyzer":"standard"
}
}
}
}'
 
 
curl -X DELETE localhost:9200/test


On Wednesday, July 23, 2014 9:37:03 AM UTC-5, Nick Tackes wrote:
>
> I have created a gist with an analyzer that uses filter shingle in 
> attempt to match sub phrases. 
>
> For instance I have entries in the table with discrete phrases like 
>
> EGFR 
> Lung Cancer 
> Lung 
> Cancer 
>
> and I want to match these when searching the phrase 'EGFR related lung 
> cancer 
>
> My expectation is that the multi word matches score higher than the single 
> matches, for instance... 
> 1. Lung Cancer 
> 2. Lung 
> 3. Cancer 
> 4. EGFR 
>
> Additionally, I tried a standard analyzer match but this didn't yield the 
> desired result either. One complicating aspect to this approach is that the 
> min_shingle_size has to be 2 or more. 
>
> How then would I be able to match single words like 'EGFR' or 'Lung'? 
>
> thanks
>
> https://gist.github.com/nicktackes/ffdbf22aba393efc2169.js
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/471d07e5-fbb5-46d8-8e36-01c1a7eb4ec3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sudden high "OS Load", then ES VM disappears

No, the VM does not response to curl requests. Closest thing I found to 
that read bytes in the API was the _cluster/stats endpoint 
--> https://gist.github.com/schonfeld/d45401e44f5961c38502

Were you referring to a different endpoint?

What're your thoughts re "angry hardware"? Insufficient resources? Are 
there any known issues with CoreOS + ES?

On Wednesday, July 23, 2014 11:12:10 AM UTC-4, Nikolas Everett wrote:
>
>
>
>
> On Wed, Jul 23, 2014 at 10:19 AM, > 
> wrote:
>
>> Looking at the JVM GC graphs, I do see some increases there, but not sure 
>> those are enough to cause this storm?
>>
>>
>> 
>>
>
> That looks like it wasn't the problem. 
>
>>  
>> 
>> The disk graphs in Marvel don't show anything out of the ordinary. I'm 
>> not sure how to check on those write_bytes and read_bytes... Where does ES 
>> report those? I'm using Google Compute Engine, and according to their 
>> minimal graphs, while there was a small spike in the disk I/Os, it wasn't 
>> anything insane.
>>
>
> Elasticsearch reports them in the same API as the number of reads.  I 
> imagine they are right next to each other in marvel but I'm not sure. 
>
>>
>> Right after the spike happened, the indexing rate spiked to 6.6k / 
>> second. Notice that the first tag notes the VM that left the cluster, the 
>> second tag shows the cluster went back to "green". Considering this 
>> happened after the node "left", does this give us any clues as for the 
>> reason?
>>
>>
>> 
>>
>> A few observations:
>> * The es process is still running on the "dead machine". I can see it 
>> when I ssh into the VM (thru ps aux, and docker)
>> * The VM doesn't show anything in the error logs, etc
>> * Running a "ps aux" on that VM actually freezes (after showing the es 
>> process)
>>
>
> That seems pretty nasty.  Does the node respond to stuff like `curl 
> localhost:9200`?  Sounds like the hardware/docker is just angry.
>
>
>
> Nik
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1243b181-3ec1-410c-8091-d561ad2b8bed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How Recover mistaken delete index？

2014-07-23 Thread Benoit Gagnon

To create and restore backups, use the Snapshots And Restore 

 APIs.

On Wednesday, July 23, 2014 10:58:23 AM UTC-4, 闫旭 wrote:
>
> Dear All！
>
>
> How can i recover deleted indux?  or how to backup the index?
>
>
> Thanks && Best Regard!
>
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c67cc05-fe5e-4649-9fdf-5be7b48d8346%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Increase writing speed from hive

2014-07-23 Thread Sakthi

How to increase writing speed to ES from hive?


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/16c3ceb3-f857-4592-bbdf-6e3dbfa14767%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

kibana panel not refreshed when query changes

2014-07-23 Thread paco

hi,

how can i force my custom kibana panel to be filtered by a query?

now it only gets filtered if I save the dashboard with the new query and 
then I refresh the page

¿maybe something lacks in the code in order for renderinge getting invoke 
after query changes?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/37f9ff83-287b-4390-9e30-7f9adf310342%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sudden high "OS Load", then ES VM disappears

On Wed, Jul 23, 2014 at 10:19 AM,  wrote:

> Looking at the JVM GC graphs, I do see some increases there, but not sure
> those are enough to cause this storm?
>
>
> 
>

That looks like it wasn't the problem.

>
> 
> The disk graphs in Marvel don't show anything out of the ordinary. I'm not
> sure how to check on those write_bytes and read_bytes... Where does ES
> report those? I'm using Google Compute Engine, and according to their
> minimal graphs, while there was a small spike in the disk I/Os, it wasn't
> anything insane.
>

Elasticsearch reports them in the same API as the number of reads.  I
imagine they are right next to each other in marvel but I'm not sure.

>
> Right after the spike happened, the indexing rate spiked to 6.6k / second.
> Notice that the first tag notes the VM that left the cluster, the second
> tag shows the cluster went back to "green". Considering this happened after
> the node "left", does this give us any clues as for the reason?
>
>
> 
>
> A few observations:
> * The es process is still running on the "dead machine". I can see it when
> I ssh into the VM (thru ps aux, and docker)
> * The VM doesn't show anything in the error logs, etc
> * Running a "ps aux" on that VM actually freezes (after showing the es
> process)
>

That seems pretty nasty.  Does the node respond to stuff like `curl
localhost:9200`?  Sounds like the hardware/docker is just angry.



Nik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0v%2B4iNmissyNcW50OmpD1yBVoJDTExCepawA1HDms-8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

How Recover mistaken delete index？

2014-07-23 Thread 闫旭

Dear All！


How can i recover deleted indux?  or how to backup the index?


Thanks && Best Regard!


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3D56AB37-A580-48AD-B60F-2231B525ADD4%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

shingle filter for sub phrase matching

2014-07-23 Thread Nick Tackes

I have created a gist with an analyzer that uses filter shingle in attempt 
to match sub phrases. 

For instance I have entries in the table with discrete phrases like 

EGFR 
Lung Cancer 
Lung 
Cancer 

and I want to match these when searching the phrase 'EGFR related lung 
cancer 

My expectation is that the multi word matches score higher than the single 
matches, for instance... 
1. Lung Cancer 
2. Lung 
3. Cancer 
4. EGFR 

Additionally, I tried a standard analyzer match but this didn't yield the 
desired result either. One complicating aspect to this approach is that the 
min_shingle_size has to be 2 or more. 

How then would I be able to match single words like 'EGFR' or 'Lung'? 

thanks

https://gist.github.com/nicktackes/ffdbf22aba393efc2169.js

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/041756c9-39b0-43df-a309-518b8dcb4326%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Using _msearch with suggesters only?

2014-07-23 Thread Gordon Rankin

I have to query several completion suggesters at the same time.  This is 
easy to do using the _suggest api.

However if I want to query multiple suggesters on different indexes I have 
two choices:


   1. Perform multiple http requests using the _suggest api
   2. Use the _msearch api.


I am currently using option 1 which appears to perform reasonably well so 
far however I would like to only perform a single request if possible so I 
have been playing with the _msearch api.

The problem is when performin a single _msearch request for multiple 
suggesters on multiple indexes, each index also ends up getting queried, 
return all the hits as though I had  also queried with matchall : {}

In order to minimize this issue I have set the size parameter to 0 which 
prevents the hits from being returned.  However I still get a total count 
returned for all the documents in each index.  

This implies to me that Elasticsearch is still doing some unnecessary work 
matching and counting documents in each index when all I really want 
returned are the suggestions.


   - Is Elasticsearch actually performing any work other than for the 
   suggesters?
   - If so is there another way to return suggestion from multiple 
   completion suggesters across several indices in a single request without 
   querying at all?
   - Would using _msearch in this manner be considered a better practice 
   than performing two or more _suggest calls in parallel?





My request currently looks like this:



var request = [{
index: 'users',
type: 'user'
},
{
size : 0,
suggest: {
users_suggest: {
text: term,
completion: {
size : 5,
field: 'users_suggest'
}
}
}
},
{
index: 'photos',
type: 'photo'
},
{
size : 0,
suggest : {
tags_suggest: {
text: term,
completion: {
size : 3,
field: 'tags_suggest'
}
},
place_suggest: {
text: term,
completion: {
size : 3,
field: 'place_suggest'
}
},
country_suggest: {
text: term,
completion: {
size : 3,
field: 'country_suggest'
}
}
}
}];




And the results I am getting returned are as follows :

[{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 28,
"max_score": 0,
"hits": []
},
"suggest": {
"users_suggest": [{
"text": "t",
"offset": 0,
"length": 1,
"options": [***suggestions***]
}]
}
}, {
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 117,
"max_score": 0,
"hits": []
},
"suggest": {
"country_suggest": [{
"text": "t",
"offset": 0,
"length": 1,
"options": []
}],
"place_suggest": [***suggestions***]
}],
"tags_suggest": [***suggestions***]
}]
}
}]



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/21f8e109-5741-4ba0-84de-1c41759dea6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sortting based on combination of ES score and two fields

It seems this is what I need

"script_score" : {
"script" : "_score + doc['my_numeric_field_1'].value + 
doc['my_numeric_field_2'].value"
}

Am I right??



On Wednesday, July 23, 2014 9:29:22 AM UTC-5, M_20 wrote:
>
> Hi, 
>
> How can I sort the result based on summation of several fields?
> For my application, I want to sort the results based on ( ElasticSearch 
> score + field_1 + field_2 )
> In fact:
> Final Score = ElasticSearch score + field_1 + field_2
> It seems kinda recursive, but I was wondering ES is able to do so.
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7686b437-a86e-4617-a4f5-378a84bb0ce9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

When to use multiple clusters

2014-07-23 Thread Alex Kehayias

I have several large indices (100M docs) on the same cluster. Is there any 
advice of when it is appropriate to separate into multiple clusters vs one 
large one? Each index has a slightly different usage profile (read vs write 
heavy, update vs insert). How many indices would you recommend for a single 
cluster? Is it ok to have many large indices on the same cluster? 

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/24ebf4dc-f281-4574-8cbb-cb049c4fac71%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sortting based on combination of ES score and two fields

Hi, 

How can I sort the result based on summation of several fields?
For my application, I want to sort the results based on ( ElasticSearch 
score + field_1 + field_2 )
In fact:
Final Score = ElasticSearch score + field_1 + field_2
It seems kinda recursive, but I was wondering ES is able to do so.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6eac0f4f-655f-4b12-840e-162276e454fd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Add / Remove nodes in cluster, good practice question

2014-07-23 Thread Pierre-Vincent Ledoux

Hi,

We have a little cluster of 2 nodes, hosting 4 indexes of about 1.5M 
documents each, replicated on both nodes.

Those 2 nodes are on VPS that are stored on the same physical host. As it 
represents a single point of failure, we have decided to start a new VPS on 
a different host.

What is the correct procedure to add the new node to the cluster, get the 
indexes replicated in it, and then removed one of the older node?

We don't use mutlicast, so I imagine that I can add the node the the 
unicast list in the config file, but how I can be sure that it will not 
fail all my cluster when I will restart elasticsearch?

Those nodes are online in production so it's a bit touchy for us to take 
any risk with it.


Cheers,

Pv


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1681d317-f0a1-4bff-9256-eabf1de2a3cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sudden high "OS Load", then ES VM disappears

Looking at the JVM GC graphs, I do see some increases there, but not sure 
those are enough to cause this storm?

The disk graphs in Marvel don't show anything out of the ordinary. I'm not 
sure how to check on those write_bytes and read_bytes... Where does ES 
report those? I'm using Google Compute Engine, and according to their 
minimal graphs, while there was a small spike in the disk I/Os, it wasn't 
anything insane.

Right after the spike happened, the indexing rate spiked to 6.6k / second. 
Notice that the first tag notes the VM that left the cluster, the second 
tag shows the cluster went back to "green". Considering this happened after 
the node "left", does this give us any clues as for the reason?

A few observations:
* The es process is still running on the "dead machine". I can see it when 
I ssh into the VM (thru ps aux, and docker)
* The VM doesn't show anything in the error logs, etc
* Running a "ps aux" on that VM actually freezes (after showing the es 
process)
* Here is the thread dump from the dead VM: 

Here is some more detailed information about the cluster:

* VMs are Google Compute Engine machines: n1-standard-4 (4 virtual CPUs, 
15GB RAM), running CoreOS 367.1.0 with Docker 1.0.1.
* The HDs are all 100gbs SSDs, delivering 3000/read & 3000/write IOPS (or 
48.00 MB/s). The Docker containers are running with 10gb max mem allocated.
* Java version "1.7.0_55"; Java(TM) SE Runtime Environment (build 
1.7.0_55-b13); Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed 
mode)
* ES_HEAP_SIZE = 7168M
* mlock is enabled
* memlock ulimit is set to unlimited

Screenshots: 
https://www.dropbox.com/s/qv6pcg46lmfu9aj/Screenshot%202014-07-23%2010.05.10.png

&& 
https://www.dropbox.com/s/05hl0rq7xngoz1k/Screenshot%202014-07-23%2010.08.01.png

On Wednesday, July 23, 2014 9:51:27 AM UTC-4, Nikolas Everett wrote:
>
> I'm not sure what "OS Load" is in this context but I'm guessing it is load 
> average.  The shape of the memory usage graph indicates that the orange 
> node might be stuck in a garbage collection storm - the numbers for heap 
> aren't going up and down - just staying constant while the load is pretty 
> high.  Might not be it, but it'd be nice to see garbage collection 
> counts/times.
>
> You also might want to look at what the CPU is doing - in particular you 
> want to know the % of time the CPU is in io wait.  If that jumped up then 
> something went wrong with the disk.  Another way to tell is to look at the 
> reads and writes that elasticsearch reports - its called write_bytes and 
> read_bytes or something.  If you have a graph of that you can see events 
> like shards moving from one node to another (write spike near the 
> configured maximum throttle - 20 or 40 MB/sec with a network spike followed 
> by a read spike) compared to regular operation (steady state reading) 
> compared to a big merge (looks like shard moving without the network spike 
> but with a user cpu spike).
>
> The idea is that you can see if any of these events occurred right before 
> your problem.
>
>
> On Wed, Jul 23, 2014 at 9:38 AM, > 
> wrote:
>
>> One additional piece of information -- the .yml conf file we use: 
>> https://gist.github.com/schonfeld/ef8f012eb0775be202ce
>>
>> On Wednesday, July 23, 2014 9:31:45 AM UTC-4, mic...@modernmast.com 
>> wrote:
>>>
>>> Hey all!
>>>
>>> I'm having some a serious problem with my ES cluster. Every now and 
>>> then, when writing to the cluster, a machine (or two) will suddenly spike 
>>> up on OS Load, writing will come to a screeching halt (>5s for 1k docs, as 
>>> opposed to ~100ms normally), and then shortly after, the VM that was 
>>> spiking will simply disappear from the cluster.
>>>
>>>
>>> 
>>> Another thing I've noticed, is that the "Document Count" dips to 0 when 
>>> this happens. FWIW - the VMs are 15gb mem / 4 cores, each, running docker.
>>>
>>>  
>>> 
>>>
>>> *Here are some logs from around the time this happened:*
>>>
>>> *Hot threads*: https://gist.github.com/schonfeld/370f9c32dbefce59e628
>>> *Nodes stats*: https://gist.github.com/schonfeld/5f528949d3a6341417dc
>>> Screenshots: https://www.dropbox.com/s/gthreucugoz0opm/Screenshot%
>>> 202014-07-23%2009.28.30.png && https://www.dropbox.com/s/
>>> ae1vijcdgv5sv7u/Screenshot%202014-07-23%2009.28.10.png
>>>
>>>
>>> Any clues, insights, and thoughts will be greatly appreciated!
>>>
>>> Thanks,
>>> - Michael
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>>

non existing scripts execute - what is going on

2014-07-23 Thread Adrian

My windows test machine where I have ES installed has restarted 
automatically several times during the day. I was testing some custom 
scripts in ES.

After restart I went back to continue my tests and realized that any script 
I told ES to run it would do the same thing and always give me results. 
Even when I used scripts which did not exist. Take this example:

{
  "query": {
"custom_score": {
  "query": {
"match_all": {}
  },
  "script": "sdgfhjgf",
  "params": {
"site": "rgf"
  },
  "lang": "native"
}
  }
}

This returns to me doc and with a score of 1!!! This script: sdgfhjgf, 
doesn't even exist yet ES returns results from it! What is going on?

PS - I tried this same JSON query on a different ES cluster and as expected 
I get an error.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f2616992-1f30-4ef0-906a-d7dc3bd583f0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is ES capable of doing pagination?

Scan/scroll is also not for exposing to "web scale" users.  Fine for tens
of users, not for millions.  There is non-trivial cost on the cluster
during scan/scroll.

For the most part we just use from and size.  There is a setting called
preference that might be worth looking at if you expect lots of scrolling.

Don't allow super deep scrolling - there are some portions of query
execution that end up having to do from + size amounts of work (O(n*log(n)
of it).  And they need O(from + size) of memory too.  So going too deep can
take up tons of memory.

Nik


On Wed, Jul 23, 2014 at 9:36 AM, Adrien Grand <
adrien.gr...@elasticsearch.com> wrote:

> Hi Prasath,
>
> Scan and scroll can only move forward. If you want to have previous/next
> buttons, you would typically run the query once again with different values
> of `from` and `size`.
>
>
> On Tue, Jul 22, 2014 at 3:16 PM, PrasathRajan 
> wrote:
>
>> can somebody suggest..
>>
>> Is it possible to jump over Page Nos [forward/backward]while using
>> Scan/Scroll for Pagination?.. Or any other effective way available for
>> pagination?.
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://elasticsearch-users.115913.n3.nabble.com/Is-ES-capable-of-doing-pagination-tp3399087p4060343.html
>> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/1406034965952-4060343.post%40n3.nabble.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Adrien Grand
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6y-hQudM4utSi-JJakdMrkJLXN90MwRa8Tu9xpuhnbyA%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2ygvWv0%2BuaQ1UNpoJwksu_sRLPdUw3cz5-gsf-uDEF%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sudden high "OS Load", then ES VM disappears

I'm not sure what "OS Load" is in this context but I'm guessing it is load
average.  The shape of the memory usage graph indicates that the orange
node might be stuck in a garbage collection storm - the numbers for heap
aren't going up and down - just staying constant while the load is pretty
high.  Might not be it, but it'd be nice to see garbage collection
counts/times.

You also might want to look at what the CPU is doing - in particular you
want to know the % of time the CPU is in io wait.  If that jumped up then
something went wrong with the disk.  Another way to tell is to look at the
reads and writes that elasticsearch reports - its called write_bytes and
read_bytes or something.  If you have a graph of that you can see events
like shards moving from one node to another (write spike near the
configured maximum throttle - 20 or 40 MB/sec with a network spike followed
by a read spike) compared to regular operation (steady state reading)
compared to a big merge (looks like shard moving without the network spike
but with a user cpu spike).

The idea is that you can see if any of these events occurred right before
your problem.

On Wed, Jul 23, 2014 at 9:38 AM,  wrote:

> One additional piece of information -- the .yml conf file we use:
> https://gist.github.com/schonfeld/ef8f012eb0775be202ce
>
> On Wednesday, July 23, 2014 9:31:45 AM UTC-4, mic...@modernmast.com wrote:
>>
>> Hey all!
>>
>> I'm having some a serious problem with my ES cluster. Every now and then,
>> when writing to the cluster, a machine (or two) will suddenly spike up on
>> OS Load, writing will come to a screeching halt (>5s for 1k docs, as
>> opposed to ~100ms normally), and then shortly after, the VM that was
>> spiking will simply disappear from the cluster.
>>
>>
>> 
>> Another thing I've noticed, is that the "Document Count" dips to 0 when
>> this happens. FWIW - the VMs are 15gb mem / 4 cores, each, running docker.
>>
>>
>> 
>>
>> *Here are some logs from around the time this happened:*
>>
>> *Hot threads*: https://gist.github.com/schonfeld/370f9c32dbefce59e628
>> *Nodes stats*: https://gist.github.com/schonfeld/5f528949d3a6341417dc
>> Screenshots: https://www.dropbox.com/s/gthreucugoz0opm/Screenshot%
>> 202014-07-23%2009.28.30.png && https://www.dropbox.com/s/
>> ae1vijcdgv5sv7u/Screenshot%202014-07-23%2009.28.10.png
>>
>>
>> Any clues, insights, and thoughts will be greatly appreciated!
>>
>> Thanks,
>> - Michael
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a7a93d4e-42e3-4b2d-bae0-2c4ca50a106d%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1%3DEQVB2pfUkja7GMSv5%2B-UUcx2_0jxQ7qHj7JG-VBxJg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is there a better way to achieve my goal than having multiple completion suggesters on a single index?

2014-07-23 Thread Gordon Rankin

Thanks Adrien...

Thanks for your speedy response.  I'm very new to Elasticsearch so it's 
good to know I am doing the write thing.

I guess i'll continue as I am unless anyone else can think of any reason 
not to.

Cheers!
 

On Wednesday, July 23, 2014 2:34:08 PM UTC+1, Adrien Grand wrote:
>
> Hi Gordon,
>
> Given your requirements, I think you are doing the right thing. There is 
> no particular concern wrt querying multiple suggesters at the same time.
>
>
> On Wed, Jul 23, 2014 at 3:20 PM, Gordon Rankin  > wrote:
>
>> I have an index of photos and need to return completion suggestions based 
>> on several of the fields:
>>
>>
>>- Tags
>>- Place 
>>- Country
>>- Date
>>
>> The simplest way to do this of course would be to create one completion 
>> suggester and simply feed the various inputs into it when indexing. 
>>
>> However, I need to receive up to 5 suggestions per field and I need to 
>> return various different outputs depending on the input (they cannot simply 
>> have a unified output)
>>
>> For example:
>>
>> When the user types "T" the suggestions should be something like the 
>> following :
>>
>> Tags : [Tree, Tiger, Toner]
>> Place : [Tenerife, The London Eye, Torquay]
>> Country : [Taiwan, Tanzania]
>> Date : []
>>
>> The date field simply stores tags for the month and year [January, 2014] 
>> enabling suggestions to come back as January when a user types "jan" and 
>> gives year suggestions when the user type "20" etc...
>>
>> In order to achieve this I have set up a different completion suggester 
>> with varying analyzers for each of the above fields, I then query all four 
>> suggesters at once in a single request.  Everything works perfectly.
>>
>> However I am left wondering if there is a better way to achieve this 
>> functionality.
>>
>>
>>- Is there any way to achieve the above with a single completion 
>>suggester 
>>- Are there any concerns/watch outs when querying multiple suggesters 
>>in this manner? Performance or otherwise.
>>
>> Thanks in advance for any advice or suggestions.
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/bd670fb7-303f-4719-821c-82b65fec86e9%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Adrien Grand
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b10e1f2a-7b3c-4102-a855-aae78da02bea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sudden high "OS Load", then ES VM disappears

One additional piece of information -- the .yml conf file we 
use: https://gist.github.com/schonfeld/ef8f012eb0775be202ce

On Wednesday, July 23, 2014 9:31:45 AM UTC-4, mic...@modernmast.com wrote:
>
> Hey all!
>
> I'm having some a serious problem with my ES cluster. Every now and then, 
> when writing to the cluster, a machine (or two) will suddenly spike up on 
> OS Load, writing will come to a screeching halt (>5s for 1k docs, as 
> opposed to ~100ms normally), and then shortly after, the VM that was 
> spiking will simply disappear from the cluster.
>
>
> 
> Another thing I've noticed, is that the "Document Count" dips to 0 when 
> this happens. FWIW - the VMs are 15gb mem / 4 cores, each, running docker.
>
>
> 
>
> *Here are some logs from around the time this happened:*
>
> *Hot threads*: https://gist.github.com/schonfeld/370f9c32dbefce59e628
> *Nodes stats*: https://gist.github.com/schonfeld/5f528949d3a6341417dc
> Screenshots: 
> https://www.dropbox.com/s/gthreucugoz0opm/Screenshot%202014-07-23%2009.28.30.png
>  
> && 
> https://www.dropbox.com/s/ae1vijcdgv5sv7u/Screenshot%202014-07-23%2009.28.10.png
>
>
> Any clues, insights, and thoughts will be greatly appreciated!
>
> Thanks,
> - Michael
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a7a93d4e-42e3-4b2d-bae0-2c4ca50a106d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Unstable cluster - "suspect illegal state: trying to move shard from primary mode to replica mode"

2014-07-23 Thread Mohamed Lrhazi

I think am running into this same issue, even after upgrading to 1.2.2.

Did you stabilize your cluster?

Thanks,
Mohamed.

On Saturday, May 24, 2014 5:05:55 AM UTC-4, Robin Clarke wrote:
>
> And found this error too in one of the nodes which left the cluster:
>
> java.lang.NullPointerException
> at 
> org.elasticsearch.gateway.local.state.meta.LocalGatewayMetaState.clusterChanged(LocalGatewayMetaState.java:185)
> at 
> org.elasticsearch.gateway.local.LocalGateway.clusterChanged(LocalGateway.java:207)
> at 
> org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:431)
> at 
> org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
> -Robin-
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f7d09651-0f97-4529-b4e3-4cee752539e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is ES capable of doing pagination?

2014-07-23 Thread Adrien Grand

Hi Prasath,

Scan and scroll can only move forward. If you want to have previous/next
buttons, you would typically run the query once again with different values
of `from` and `size`.


On Tue, Jul 22, 2014 at 3:16 PM, PrasathRajan 
wrote:

> can somebody suggest..
>
> Is it possible to jump over Page Nos [forward/backward]while using
> Scan/Scroll for Pagination?.. Or any other effective way available for
> pagination?.
>
>
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/Is-ES-capable-of-doing-pagination-tp3399087p4060343.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1406034965952-4060343.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6y-hQudM4utSi-JJakdMrkJLXN90MwRa8Tu9xpuhnbyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is there a better way to achieve my goal than having multiple completion suggesters on a single index?

2014-07-23 Thread Adrien Grand

Hi Gordon,

Given your requirements, I think you are doing the right thing. There is no
particular concern wrt querying multiple suggesters at the same time.


On Wed, Jul 23, 2014 at 3:20 PM, Gordon Rankin 
wrote:

> I have an index of photos and need to return completion suggestions based
> on several of the fields:
>
>
>- Tags
>- Place
>- Country
>- Date
>
> The simplest way to do this of course would be to create one completion
> suggester and simply feed the various inputs into it when indexing.
>
> However, I need to receive up to 5 suggestions per field and I need to
> return various different outputs depending on the input (they cannot simply
> have a unified output)
>
> For example:
>
> When the user types "T" the suggestions should be something like the
> following :
>
> Tags : [Tree, Tiger, Toner]
> Place : [Tenerife, The London Eye, Torquay]
> Country : [Taiwan, Tanzania]
> Date : []
>
> The date field simply stores tags for the month and year [January, 2014]
> enabling suggestions to come back as January when a user types "jan" and
> gives year suggestions when the user type "20" etc...
>
> In order to achieve this I have set up a different completion suggester
> with varying analyzers for each of the above fields, I then query all four
> suggesters at once in a single request.  Everything works perfectly.
>
> However I am left wondering if there is a better way to achieve this
> functionality.
>
>
>- Is there any way to achieve the above with a single completion
>suggester
>- Are there any concerns/watch outs when querying multiple suggesters
>in this manner? Performance or otherwise.
>
> Thanks in advance for any advice or suggestions.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/bd670fb7-303f-4719-821c-82b65fec86e9%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4%2BPdWf1xrX9DoWA7FgWfKjE_ytTG-UpBTG%3DkEYWJ90ng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Even Shard Distribution?

Got it. To be honest, I was pretty sure of that, up until this AM, when 
that same OS Load spike happened again. But this time, the shards were 
allocated more evenly. So I'm not sure that's even the problem any more. I 
just posted a new post with more information about the load spike issue. 
Would you mind taking a look?

Thanks for all your help, Nik.

On Wednesday, July 23, 2014 9:30:00 AM UTC-4, Nikolas Everett wrote:
>
>
>
>
> On Wed, Jul 23, 2014 at 9:21 AM, > 
> wrote:
>
>> Thanks for that, Nik. I'm okay with evenly spreading all the indices, 
>> rather than just the one I'm having issues with. I'll give your config a 
>> try!
>>
>> Def no special configurations on that one. We didn't even realize there 
>> was such a thing as allocation configuration up until yesterday (after the 
>> 0/6 allocation happened). The 0/6 node did, however, contain 4/6 
>> replicas... If that makes a difference?
>>
>>
> It certainly does.  The way the allocation settings are configured now the 
> cluster thinks of replicas as just about the same as masters.  For the most 
> part they are, too.  For us they certainly are.
>
> If you are sure that being a shard master is significantly more work then 
> being a shard replica then raise the "primary" setting to something like .5 
> and lower everything else to total 1.
>
> That whole totally 1 thing isn't required but it makes the numbers easier 
> to think about.
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/08002fec-3f29-42a8-8fa9-9701095d099d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sudden high "OS Load", then ES VM disappears

Hey all!

I'm having some a serious problem with my ES cluster. Every now and then, 
when writing to the cluster, a machine (or two) will suddenly spike up on 
OS Load, writing will come to a screeching halt (>5s for 1k docs, as 
opposed to ~100ms normally), and then shortly after, the VM that was 
spiking will simply disappear from the cluster.


Another thing I've noticed, is that the "Document Count" dips to 0 when 
this happens. FWIW - the VMs are 15gb mem / 4 cores, each, running docker.



*Here are some logs from around the time this happened:*

*Hot threads*: https://gist.github.com/schonfeld/370f9c32dbefce59e628
*Nodes stats*: https://gist.github.com/schonfeld/5f528949d3a6341417dc
Screenshots: 
https://www.dropbox.com/s/gthreucugoz0opm/Screenshot%202014-07-23%2009.28.30.png
 
&& 
https://www.dropbox.com/s/ae1vijcdgv5sv7u/Screenshot%202014-07-23%2009.28.10.png


Any clues, insights, and thoughts will be greatly appreciated!

Thanks,
- Michael

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e8bb2fa4-42da-4177-9b75-d9d4843fcc93%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Even Shard Distribution?

On Wed, Jul 23, 2014 at 9:21 AM,  wrote:

> Thanks for that, Nik. I'm okay with evenly spreading all the indices,
> rather than just the one I'm having issues with. I'll give your config a
> try!
>
> Def no special configurations on that one. We didn't even realize there
> was such a thing as allocation configuration up until yesterday (after the
> 0/6 allocation happened). The 0/6 node did, however, contain 4/6
> replicas... If that makes a difference?
>
>
It certainly does.  The way the allocation settings are configured now the
cluster thinks of replicas as just about the same as masters.  For the most
part they are, too.  For us they certainly are.

If you are sure that being a shard master is significantly more work then
being a shard replica then raise the "primary" setting to something like .5
and lower everything else to total 1.

That whole totally 1 thing isn't required but it makes the numbers easier
to think about.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3%3DvvegS%3D7Hvv_KxOAMCn8rQ5fsJES__Lkg6wn8T4HaVw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Even Shard Distribution?

Thanks for that, Nik. I'm okay with evenly spreading all the indices, 
rather than just the one I'm having issues with. I'll give your config a 
try!

Def no special configurations on that one. We didn't even realize there was 
such a thing as allocation configuration up until yesterday (after the 0/6 
allocation happened). The 0/6 node did, however, contain 4/6 replicas... If 
that makes a difference?


On Wednesday, July 23, 2014 9:03:46 AM UTC-4, Nikolas Everett wrote:
>
> For the 0/6 node are you sure you don't have some configuration preventing 
> shards from allocating there?
>
> We use this:
>
> http://git.wikimedia.org/blob/operations%2Fpuppet.git/d2e2989bbafc7f7f730efacaa652a05bec3ef541/modules%2Felasticsearch%2Ftemplates%2Felasticsearch.yml.erb#L420
> but its is designed to more evenly spread shards from every index across 
> all of our nodes rather than more evenly spread total shards across the 
> nodes.
>
> When we update these dynamically using the API we found you sometimes have 
> to jiggle the shard allocator to force it to start thinking again.  So use 
> the manual allocation api to move a small shard from one node to another or 
> something.
>
> Nik
>
>
> On Wed, Jul 23, 2014 at 8:48 AM, > 
> wrote:
>
>> Hey guys,
>>
>> We've recently set up a 5 node ES cluster, serving our 6-shards / 
>> 1-replica index (we chose 6 back when we only had 3 nodes). We sometimes 
>> find a highly uneven distribution of shards across the nodes. For example, 
>> when we had 3 nodes, 4/6 of the index lived on 1 node, 2/6 lived on 
>> another, and 0/0 lived on the last node. While we do have distributed 
>> replicas, we've noticed a very very high "OS load" on the machine that 
>> served 4/6 of the index. In fact, the OS load was so high, that it brought 
>> the VM to a screeching halt.
>>
>> My question is: how can we force a more even distribution of the shards? 
>> I've played around with "cluster.routing.allocation.balance.*", but those 
>> had little to no effect. And, from my understanding of the docs, 
>> "cluster.routing.allocation.awareness" is aimed for a more even zone 
>> distribution, rather than shards in the same zone.
>>
>> Thanks in advance for all your help! Much appreciated!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a5b0aa91-0f2f-42f3-9d5a-7c61156db2fa%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7ccf66cf-987c-4b04-9c18-204c28e3ba7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Is there a better way to achieve my goal than having multiple completion suggesters on a single index?

2014-07-23 Thread Gordon Rankin

I have an index of photos and need to return completion suggestions based 
on several of the fields:


   - Tags
   - Place
   - Country
   - Date

The simplest way to do this of course would be to create one completion 
suggester and simply feed the various inputs into it when indexing. 

However, I need to receive up to 5 suggestions per field and I need to 
return various different outputs depending on the input (they cannot simply 
have a unified output)

For example:

When the user types "T" the suggestions should be something like the 
following :

Tags : [Tree, Tiger, Toner]
Place : [Tenerife, The London Eye, Torquay]
Country : [Taiwan, Tanzania]
Date : []

The date field simply stores tags for the month and year [January, 2014] 
enabling suggestions to come back as January when a user types "jan" and 
gives year suggestions when the user type "20" etc...

In order to achieve this I have set up a different completion suggester 
with varying analyzers for each of the above fields, I then query all four 
suggesters at once in a single request.  Everything works perfectly.

However I am left wondering if there is a better way to achieve this 
functionality.


   - Is there any way to achieve the above with a single completion 
   suggester
   - Are there any concerns/watch outs when querying multiple suggesters in 
   this manner? Performance or otherwise.

Thanks in advance for any advice or suggestions.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bd670fb7-303f-4719-821c-82b65fec86e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Even Shard Distribution?

For the 0/6 node are you sure you don't have some configuration preventing
shards from allocating there?

We use this:
http://git.wikimedia.org/blob/operations%2Fpuppet.git/d2e2989bbafc7f7f730efacaa652a05bec3ef541/modules%2Felasticsearch%2Ftemplates%2Felasticsearch.yml.erb#L420
but its is designed to more evenly spread shards from every index across
all of our nodes rather than more evenly spread total shards across the
nodes.

When we update these dynamically using the API we found you sometimes have
to jiggle the shard allocator to force it to start thinking again.  So use
the manual allocation api to move a small shard from one node to another or
something.

Nik


On Wed, Jul 23, 2014 at 8:48 AM,  wrote:

> Hey guys,
>
> We've recently set up a 5 node ES cluster, serving our 6-shards /
> 1-replica index (we chose 6 back when we only had 3 nodes). We sometimes
> find a highly uneven distribution of shards across the nodes. For example,
> when we had 3 nodes, 4/6 of the index lived on 1 node, 2/6 lived on
> another, and 0/0 lived on the last node. While we do have distributed
> replicas, we've noticed a very very high "OS load" on the machine that
> served 4/6 of the index. In fact, the OS load was so high, that it brought
> the VM to a screeching halt.
>
> My question is: how can we force a more even distribution of the shards?
> I've played around with "cluster.routing.allocation.balance.*", but those
> had little to no effect. And, from my understanding of the docs,
> "cluster.routing.allocation.awareness" is aimed for a more even zone
> distribution, rather than shards in the same zone.
>
> Thanks in advance for all your help! Much appreciated!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a5b0aa91-0f2f-42f3-9d5a-7c61156db2fa%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2825Vk0TZOiLFXnd1iT%2BnemwDOsWx7XxGyx8WznAHCUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: how would you design the store model in Elasticsearch for user behavior data

2014-07-23 Thread panfei

{
"size": 0,
"query": {
"filtered": {
"filter": {
"regexp": {
"who": "[0-9]+"
}
}
}
},
"aggs": {
"max_who": {
"max": {
"script": "Double.parseDouble(_source.who)"
}
}
}
}

This is very slow a query ...


2014-07-23 17:22 GMT+08:00 panfei :

> user behavior data like this(transformed to JSON):
>
> {"uid":"user001", "action":"click", "context":
> {"level":21,"ip":"222.222.222.222", "val":87}}
> {"uid":"user002", "action":"click", "context":
> {"level":28,"ip":"222.222.222.221","val":96}} #1
> {"uid":"user002", "action":"buy", "context":
> {"level":28,"ip":"222.222.222.221","val":"abc"}} #2
> ...
>
> 1. here val is in a numeric format
> 2. here val is provided as a string
>
>
> as for dynamic mapping, if the "val" field will be mapped to long type.
> and then if another user action also with a val context, but not in numeric
> format will case exception:
>
> error" : "RemoteTransportException[[Spike][inet[/192.168.2.246:9300]][index]];
> nested: MapperParsingException[failed to parse [context.val]]; nested:
> NumberFormatException[For input string: \"abc\"]; ",
>
> we can avoid this by setting "index" to "not_analyzed", then everything
> will be treated as string and we can put anything to this field.
>
> but, in this case, when we want to do some analytic on the "click" action,
> for example to calculate the average val of all the user. it need to
> convert every "val" from string to long to meet this purpose.it's really
> a slow process to do type conversion by using script on millions of data.
> In our case, it take about 90s to get the result on about 100 million
> records.
>
> so , is there a better way to optimize this ? thanks very much in advance!
>
> PS:
> Elasticsearch 1.2.1
> 9 nodes, each with 8 CPU cores and 48GB RAM (ES_HEAP_SIZE=16GB)
> 10 indexes, each with 5 shards and 1 replication
> total docs now: 130718318
> total docs in size: 65GB
>
>
>
> --
> 不学习，不知道
>



-- 
不学习，不知道

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BJstLCNjiFbgCQ%3DJfkxVeaZgiJVwsXuzqmtFHdGLAaqu7iCCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: how would you design the store model in Elasticsearch for user behavior data

2014-07-23 Thread panfei

is there any way to convert data type without using the script mechinism ?


2014-07-23 17:22 GMT+08:00 panfei :

> user behavior data like this(transformed to JSON):
>
> {"uid":"user001", "action":"click", "context":
> {"level":21,"ip":"222.222.222.222", "val":87}}
> {"uid":"user002", "action":"click", "context":
> {"level":28,"ip":"222.222.222.221","val":96}} #1
> {"uid":"user002", "action":"buy", "context":
> {"level":28,"ip":"222.222.222.221","val":"abc"}} #2
> ...
>
> 1. here val is in a numeric format
> 2. here val is provided as a string
>
>
> as for dynamic mapping, if the "val" field will be mapped to long type.
> and then if another user action also with a val context, but not in numeric
> format will case exception:
>
> error" : "RemoteTransportException[[Spike][inet[/192.168.2.246:9300]][index]];
> nested: MapperParsingException[failed to parse [context.val]]; nested:
> NumberFormatException[For input string: \"abc\"]; ",
>
> we can avoid this by setting "index" to "not_analyzed", then everything
> will be treated as string and we can put anything to this field.
>
> but, in this case, when we want to do some analytic on the "click" action,
> for example to calculate the average val of all the user. it need to
> convert every "val" from string to long to meet this purpose.it's really
> a slow process to do type conversion by using script on millions of data.
> In our case, it take about 90s to get the result on about 100 million
> records.
>
> so , is there a better way to optimize this ? thanks very much in advance!
>
> PS:
> Elasticsearch 1.2.1
> 9 nodes, each with 8 CPU cores and 48GB RAM (ES_HEAP_SIZE=16GB)
> 10 indexes, each with 5 shards and 1 replication
> total docs now: 130718318
> total docs in size: 65GB
>
>
>
> --
> 不学习，不知道
>



-- 
不学习，不知道

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BJstLDESB3CGZzR4RBX3qYVD231dXTw8h1BMH2Tm%3Dmrycyt8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Even Shard Distribution?

Hey guys,

We've recently set up a 5 node ES cluster, serving our 6-shards / 1-replica 
index (we chose 6 back when we only had 3 nodes). We sometimes find a 
highly uneven distribution of shards across the nodes. For example, when we 
had 3 nodes, 4/6 of the index lived on 1 node, 2/6 lived on another, and 
0/0 lived on the last node. While we do have distributed replicas, we've 
noticed a very very high "OS load" on the machine that served 4/6 of the 
index. In fact, the OS load was so high, that it brought the VM to 
a screeching halt.

My question is: how can we force a more even distribution of the shards? 
I've played around with "cluster.routing.allocation.balance.*", but those 
had little to no effect. And, from my understanding of the docs, 
"cluster.routing.allocation.awareness" is aimed for a more even zone 
distribution, rather than shards in the same zone.

Thanks in advance for all your help! Much appreciated!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a5b0aa91-0f2f-42f3-9d5a-7c61156db2fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ANN] Elasticsearch Twitter River plugin 2.2.0 released

Rivers will be deprecated in favor of logstash inputs.
Deprecated does not mean removed yet.
So in the meantime we still try to keep up to date official plugins.

But yes, you should prefer using if possible logstash twitter input
(http://logstash.net/docs/1.4.2/inputs/twitter)

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 23 juillet 2014 à 14:08:49, James Green (james.mk.gr...@gmail.com) a écrit:

I was told a week or two ago on IRC that rivers were deprecated in favour of
external data sources like Logstash.

Is this not correct?

On 23 July 2014 13:00, Elasticsearch Team wrote:
Heya,

We are pleased to announce the release of the Elasticsearch Twitter River
plugin, version 2.2.0

The Twitter River plugin allows index twitter stream using elasticsearch rivers
feature.
Release Notes - Version 2.2.0

Fix

[62] - Generate default mapping even if index already exists
[60] - ignore_retweets does not ignore RT
Update

[61] - Update to Twitter4J 4.0.2
[58] - Remove old deprecated user and password properties
[57] - Deprecate usage of camelCase settings
[54] - Added not_analyzed to retweet user_screen_name
[52] - Update to elasticsearch 1.2.0
New

[56] - Add oauth credentials in `elasticsearch.yml` file
[51] - Support geo location array format
[50] - Add user stream support
[37] - Move tests to elasticsearch test framework
Doc

[49] - [DOC] Link to rivers documentation is incorrect
Issues, Pull requests, Feature requests are warmly welcome on
elasticsearch-river-twitter project repository!

For questions or comments around this plugin, feel free to use elasticsearch
mailing list!

Enjoy,

- The Elasticsearch team

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/53cfa3ef.8124b40a.2b81.3571SMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAMH6%2Bawb9LYHPnm_5yxVJBr6c6xNkuz200mjES52uyCOPTfK%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53cfa6ef.507ed7ab.13e40%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: [ANN] Elasticsearch Twitter River plugin 2.2.0 released

2014-07-23 Thread James Green

I was told a week or two ago on IRC that rivers were deprecated in favour
of external data sources like Logstash.

Is this not correct?


On 23 July 2014 13:00, Elasticsearch Team  wrote:

>  Heya,
>
> We are pleased to announce the release of the *Elasticsearch Twitter
> River plugin*, *version 2.2.0*
>
> The Twitter River plugin allows index twitter stream using elasticsearch
> rivers feature.
>
> Release Notes - Version 2.2.0 Fix
>
>- [62
>]
>- Generate default mapping even if index already exists
>- [60
>]
>- ignore_retweets does not ignore RT
>
> Update
>
>- [61
>]
>- Update to Twitter4J 4.0.2
>- [58
>]
>- Remove old deprecated user and password properties
>- [57
>]
>- Deprecate usage of camelCase settings
>- [54
>]
>- Added not_analyzed to retweet user_screen_name
>- [52
>]
>- Update to elasticsearch 1.2.0
>
> New
>
>- [56
>]
>- Add oauth credentials in `elasticsearch.yml` file
>- [51
>]
>- Support geo location array format
>- [50
>]
>- Add user stream support
>- [37
>]
>- Move tests to elasticsearch test framework
>
> Doc
>
>- [49
>]
>- [DOC] Link to rivers documentation is incorrect
>
> Issues, Pull requests, Feature requests are warmly welcome on
> elasticsearch-river-twitter
>  project
> repository!
>
> For questions or comments around this plugin, feel free to use
> elasticsearch mailing list
> !
>
> Enjoy,
>
> - The Elasticsearch team 
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/53cfa3ef.8124b40a.2b81.3571SMTPIN_ADDED_MISSING%40gmr-mx.google.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMH6%2Bawb9LYHPnm_5yxVJBr6c6xNkuz200mjES52uyCOPTfK%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ERROR][bootstrap] {1.2.2}: Initialization Failed ... - NullPointerException[null]

2014-07-23 Thread vjbangis

I tried EC2 discovery with the following settings, still keeps 
"elected_as_master" with the other node. 

discovery.type: ec2
discovery.ec2.groups: "security groups"
discovery.ec2.host_type: private_ip
cloud.aws.region: "ap-southeast-1"
cloud.aws.access_key: [access key]
cloud.aws.secret_key: [security key]
cloud.node.auto_attributes: true
#discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false

Security groups:
1. Allow inbound Source IP "any" to port 9200.
2. Allow inbound Source IP "any" to port 9300.
3. Allow inbound All Traffic within subnet.

it says from this link  
As my 
firewall was blocking the connections from the private IPs, the ec2 
discovery was not working.
so I "Allow inbound All Traffic within subnet" in security groups.


NODE1
[2014-07-23 11:33:34,862][DEBUG][discovery.ec2] [Tobirama] 
using dynamic discovery nodes []
[2014-07-23 11:33:34,863][DEBUG][discovery.ec2] [Tobirama] 
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-07-23 11:33:34,869][DEBUG][cluster.service  ] [Tobirama] 
processing [zen-disco-join (elected_as_master)]: execute
[2014-07-23 11:33:34,871][DEBUG][cluster.service  ] [Tobirama] 
cluster state updated, version [1], source [zen-disco-join 
*(elected_as_master*)]
[2014-07-23 11:33:34,872][INFO ][cluster.service  ] [Tobirama] 
new_master 
[Tobirama][cl1_rxmUTAmNtc1RfS2D4A][DV02][inet[/192.168.11.190:9300]], 
reason: zen-disco-join (elected_as_master)
[2014-07-23 11:33:34,917][DEBUG][transport.netty  ] [Tobirama] 
connected to node 
[[Tobirama][cl1_rxmUTAmNtc1RfS2D4A][DV02][inet[/192.168.11.190:9300]]]

NODE2
[2014-07-23 11:42:16,132][DEBUG][discovery.ec2] [Hiruzen] using 
dynamic discovery nodes []
[2014-07-23 11:42:16,133][DEBUG][discovery.ec2] [Hiruzen] 
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-07-23 11:42:16,248][DEBUG][cluster.service  ] [Hiruzen] 
processing [zen-disco-join (elected_as_master)]: execute
[2014-07-23 11:42:16,250][DEBUG][cluster.service  ] [Hiruzen] 
cluster state updated, version [1], source [zen-disco-join (
*elected_as_master)]*
[2014-07-23 11:42:16,251][INFO ][cluster.service  ] [Hiruzen] 
new_master 
[Hiruzen][_A3cxw7mRyO-kdUbsue2lg][DV03][inet[/192.168.11.69:9300]], reason: 
zen-disco-join (elected_as_master)
[2014-07-23 11:42:16,966][DEBUG][transport.netty  ] [Hiruzen] 
connected to node 
[[Hiruzen][_A3cxw7mRyO-kdUbsue2lg][DV03][inet[/192.168.11.69:9300]]]
[2014-07-23 11:42:16,967][DEBUG][cluster.service  ] [Hiruzen] 
publishing cluster state version 1
[2014-07-23 11:42:16,968][DEBUG][cluster.service  ] [Hiruzen] set 
local cluster state to version 1
[2014-07-23 11:42:16,971][DEBUG][river.cluster] [Hiruzen] 
processing [reroute_rivers_node_changed]: execute
[2014-07-23 11:42:16,971][DEBUG][river.cluster] [Hiruzen] 
processing [reroute_rivers_node_changed]: no change in cluster_state
[2014-07-23 11:42:16,972][DEBUG][cluster.service  ] [Hiruzen] 
processing [zen-disco-join (elected_as_master)]: done applying updated 
cluster_state (version: 1)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/189a1a41-abbe-4291-af1f-e2dc7d8b9ee3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch Twitter River plugin 2.2.0 released


Heya,


We are pleased to announce the release of the Elasticsearch Twitter River 
plugin, version 2.2.0.

The Twitter River plugin allows index twitter stream using elasticsearch rivers 
feature.

https://github.com/elasticsearch/elasticsearch-river-twitter/

Release Notes - elasticsearch-river-twitter - Version 2.2.0


Fix:
 * [62] - Generate default mapping even if index already exists 
(https://github.com/elasticsearch/elasticsearch-river-twitter/issues/62)
 * [60] - ignore_retweets does not ignore RT 
(https://github.com/elasticsearch/elasticsearch-river-twitter/issues/60)

Update:
 * [61] - Update to Twitter4J 4.0.2 
(https://github.com/elasticsearch/elasticsearch-river-twitter/issues/61)
 * [58] - Remove old deprecated user and password properties 
(https://github.com/elasticsearch/elasticsearch-river-twitter/issues/58)
 * [57] - Deprecate usage of camelCase settings 
(https://github.com/elasticsearch/elasticsearch-river-twitter/issues/57)
 * [54] - Added not_analyzed to retweet user_screen_name 
(https://github.com/elasticsearch/elasticsearch-river-twitter/pull/54)
 * [52] - Update to elasticsearch 1.2.0 
(https://github.com/elasticsearch/elasticsearch-river-twitter/issues/52)

New:
 * [56] - Add oauth credentials in `elasticsearch.yml` file 
(https://github.com/elasticsearch/elasticsearch-river-twitter/issues/56)
 * [51] - Support geo location array format 
(https://github.com/elasticsearch/elasticsearch-river-twitter/issues/51)
 * [50] - Add user stream support 
(https://github.com/elasticsearch/elasticsearch-river-twitter/pull/50)
 * [37] - Move tests to elasticsearch test framework 
(https://github.com/elasticsearch/elasticsearch-river-twitter/issues/37)

Doc:
 * [49] - [DOC] Link to rivers documentation is incorrect 
(https://github.com/elasticsearch/elasticsearch-river-twitter/pull/49)


Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-river-twitter project repository: 
https://github.com/elasticsearch/elasticsearch-river-twitter/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53cfa3ef.8124b40a.2b81.3571SMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

Sort search results by Bayesian Average

2014-07-23 Thread Nick T

Hi,

I have an interesting use case which I posted up on Stack Overflow. If 
anyone has the time to take a look and share their thoughts that would be 
great: 
http://stackoverflow.com/questions/24885143/elasticsearch-bayesian-average.

Cheers

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b0a9e19e-f8d7-4633-8592-da393629aa26%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

how would you design the store model in Elasticsearch for user behavior data

2014-07-23 Thread panfei

user behavior data like this(transformed to JSON):

{"uid":"user001", "action":"click", "context":
{"level":21,"ip":"222.222.222.222", "val":87}}
{"uid":"user002", "action":"click", "context":
{"level":28,"ip":"222.222.222.221","val":96}} #1
{"uid":"user002", "action":"buy", "context":
{"level":28,"ip":"222.222.222.221","val":"abc"}} #2
...

1. here val is in a numeric format
2. here val is provided as a string


as for dynamic mapping, if the "val" field will be mapped to long type. and
then if another user action also with a val context, but not in numeric
format will case exception:

error" : "RemoteTransportException[[Spike][inet[/192.168.2.246:9300]][index]];
nested: MapperParsingException[failed to parse [context.val]]; nested:
NumberFormatException[For input string: \"abc\"]; ",

we can avoid this by setting "index" to "not_analyzed", then everything
will be treated as string and we can put anything to this field.

but, in this case, when we want to do some analytic on the "click" action,
for example to calculate the average val of all the user. it need to
convert every "val" from string to long to meet this purpose.it's really a
slow process to do type conversion by using script on millions of data. In
our case, it take about 90s to get the result on about 100 million records.

so , is there a better way to optimize this ? thanks very much in advance!

PS:
Elasticsearch 1.2.1
9 nodes, each with 8 CPU cores and 48GB RAM (ES_HEAP_SIZE=16GB)
10 indexes, each with 5 shards and 1 replication
total docs now: 130718318
total docs in size: 65GB



-- 
不学习，不知道

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BJstLBoekO3oCOLxdc5hUGdThtzs6Uwc8mCCqxZoinhsB3hog%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Questions RE: ES Script Written in JavaScript

2014-07-23 Thread Roland Dunn

OK, answering a couple of my own questions:

1. Return value: _score e.g.

_score = Math.sqrt(retscore);

2. Accessing nested: easy enough: 
_source.things[i]

3. Debugging. Dunno.





On Tuesday, 22 July 2014 17:17:40 UTC+1, Roland Dunn wrote:
>
> Hi,
> Wonder if anyone could help. I've managed to get a query script up and 
> running, using JavaScript. The script resides in 
> /etc/elasticsearch/scripts. 
>
> I'm using a function_score query as follows:
>
> "function_score": {
>   "query": {
> "match_all": {}
>   },
>   "script_score": {
> "script": "poot",
> "params": {
>   "topics_dict_param": {
> "4": 102
>   }
> }
>   }
>
> I'm passing into the JS script a dictionary of values.
>
> Questions:
>
> 1. What determines the "return value" of the JS script? At the moment I 
> just have "retscore;" as a standalone line in the script, this seems to 
> work, but feels bit random. Would like something more explicit.
>
> 2. How can I debug the JS script? Can I output some values somewhere? E.g. 
> print, or log? If so, where does the output go?
>
> 3. The mapping of the ES index involves an array of nested objects e.g. 
>
> "things": {
> "type": "nested",
> "properties": {
> "thing_id": {
> "type": "long"
> },
> "count": {
> "type": "long"
> },
> "sentiment": {
> "type": "double"
> }
> }
> },
>
> Can I access this via _source.things? And if I can, can I do something 
> like _source.things[0].thing_id ?
>
> Any help v welcome.
>
> Thanks,
> R
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1b0ef9ea-f17d-451c-8849-4b4c52388257%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: kibana dashboard save failed