date:20141020

Do we have cat API for Java? 
I am using ElasticSearch version 1.3.4 and figure out I can only use HTTP 
cat api.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d7e6bc32-487c-419a-9b1c-29ca30b1bd5f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to find the thread pool size of an ElasticSearch cluster?

50 is the maximum for one node, so 200 is the maximum for the entire 
cluster right?

It means to avoid being rejected, I should gently send requests to the 
cluster, not to overwhelm the queue?

On Monday, 20 October 2014 22:27:26 UTC+7, Jörg Prante wrote:
>
> This is not the maximum number of requests you can send. It means "when 
> bulk indexing on a node gets too busy and must be queued, the maximum 
> number of actions that are allowed to queue up before a client is notified 
> of rejections is 50".
>
> Jörg
>
>
> On Mon, Oct 20, 2014 at 3:57 PM, truong ha  > wrote:
>
>> So in my case, which is the maximum requests I can send: *200 or 50*?
>>
>> On Monday, 20 October 2014 18:09:28 UTC+7, Jörg Prante wrote:
>>>
>>> bulk.queueSize is the maximum size before requests are rejected.
>>>
>>> Jörg
>>>
>>> On Mon, Oct 20, 2014 at 12:09 PM, truong ha  wrote:
>>>
 I'm writing the concurrent code to send bulk index to ElasticSearch, 
 and sending this query to get the thread pool size:

 GET /_cat/thread_pool?v&h=host,bulk.active,bulk.queueSize

 The response is

 hostbulk.active bulk.queueSize 
 1D4HPY1   0 50 
 1D4HPY2   0 50
 1D4HPY3   0 50 
 1D4HPY4   0 50

 So how can I calculate the actual pool size of that cluster? Is it the 
 sum of all hosts which means 200?

  -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/e6e48fe3-6269-493c-9258-fdc97baeb27e%
 40googlegroups.com 
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/29373a56-dbe7-4afb-9733-78f3fbbf5c2d%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/66161b58-eeca-4277-9e81-12f323a8181a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to do a rolling downgrade of servers

2014-10-20 Thread David Pilato

This should work as described.
Make sure that you have enough disk space to hold all your primary shards.

David.

> Le 21 oct. 2014 à 02:14, David Montgomery  a écrit 
> :
> 
>  Hi,
> 
> I have 3 ES servers that are m1.large.  I need to change the machines types 
> and reduce to 1 node.
> 
> The indexing and the shards were left at there default values and I am using 
> unicast.
> 
> 
> So  if I want to go to one node can I just stop 2 ES severs?
> 
> And on the last reaming node, that is e.g. m1.large, can  just add another 
> 1.medium node then when done syncing then stop the first m1.large?
> 
> I should have 1 m1.medium then
> 
> Thanks 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/c0b52869-5758-499e-b401-cbbb763f778d%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/67D282BA-391E-486E-8B4D-8AE2CBB1797D%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak? (reposted with better formatting)


Actually now that I read the bug a little more carefully, I'm not so 
optimistic.

* The cache here 
(https://github.com/elasticsearch/elasticsearch/issues/6268) is the filter 
cache and mine was only set at 8 gb.
* Maybe fielddata is a guava cache ... but I did set it to 30% for a run 
with 96gb heap - so the fielddata cache is 28.8gb (< 32 gb).

Nonetheless, I'm trying a run now with an explicit 31gb of fielddata cache 
and will report back.

### 96 gb heap with 30% fielddata cache and 8gb filter cache

http://i.imgur.com/FMp49ZZ.png




On Monday, October 20, 2014 9:18:22 PM UTC-4, Gavin Seng wrote:
>
>
> Thanks Adrien, my cache is exactly 32GB so I'm cautiously optimistic ... 
> will try it out and report back!
>
> From Adrien Grand:
> You might be hit by the following Guava bug: 
> https://github.com/elasticsearch/elasticsearch/issues/6268. It was fixed 
> in Elasticsearch 1.1.3/1.2.1/1.3.0
>
>
> On Monday, October 20, 2014 11:42:34 AM UTC-4, Gavin Seng wrote:
>>
>>
>> ### JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak?
>>
>> ** reposting because 1st one came out w/o images and all kinds of strange 
>> spaces.
>>
>> Hi,
>>
>> We're seeing issues where GC collects less and less memory over time 
>> leading to the need to restart our nodes.
>>
>> The following is our setup and what we've tried. Please tell me if 
>> anything is lacking and I'll be glad to provide more details.
>>
>> Also appreciate any advice on how we can improve our configurations.
>>
>> ### 32 GB heap
>>
>> http://i.imgur.com/JNpWeTw.png
>> 
>>
>>
>> ### 65 GB heap
>>
>> http://i.imgur.com/qcLhC3M.png
>> 
>>
>>
>>
>> ### 65 GB heap with changed young/old ratio
>>
>> http://i.imgur.com/Aa3fOMG.png
>> 
>>
>>
>> ### Cluster Setup
>>
>> * Tribes that link to 2 clusters
>> * Cluster 1
>>   * 3 masters (vms, master=true, data=false)
>>   * 2 hot nodes (physical, master=false, data=true)
>> * 2 hourly indices (1 for syslog, 1 for application logs)
>> * 1 replica
>> * Each index ~ 2 million docs (6gb - excl. of replica)
>> * Rolled to cold nodes after 48 hrs
>>   * 2 cold nodes (physical, master=false, data=true)
>> * Cluster 2
>>   * 3 masters (vms, master=true, data=false)
>>   * 2 hot nodes (physical, master=false, data=true)
>> * 1 hourly index
>> * 1 replica
>> * Each index ~ 8 million docs (20gb - excl. of replica)
>> * Rolled to cold nodes after 48 hrs
>>   * 2 cold nodes (physical, master=false, data=true)
>>
>> Interestingly, we're actually having problems on Cluster 1's hot nodes 
>> even though it indexes less.
>>
>> It suggests that this is a problem with searching because Cluster 1 is 
>> searched on a lot more.
>>
>> ### Machine settings (hot node)
>>
>> * java
>>   * java version "1.7.0_11"
>>   * Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
>>   * Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
>> * 128gb ram
>> * 8 cores, 32 cpus
>> * ssds (raid 0)
>>
>> ### JVM settings
>>
>> ```
>> java
>> -Xms96g -Xmx96g -Xss256k
>> -Djava.awt.headless=true
>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
>> -XX:CMSInitiatingOccupancyFraction=75
>> -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram 
>> -XX:+PrintTenuringDistribution
>> -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/elasticsearch/gc.log 
>> -XX:+HeapDumpOnOutOfMemoryError
>> -verbose:gc -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation 
>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
>> -Xloggc:[...]
>> -Dcom.sun.management.jmxremote 
>> -Dcom.sun.management.jmxremote.local.only=[...]
>> -Dcom.sun.management.jmxremote.ssl=[...] 
>> -Dcom.sun.management.jmxremote.authenticate=[...]
>> -Dcom.sun.management.jmxremote.port=[...]
>> -Delasticsearch -Des.pidfile=[...]
>> -Des.path.home=/usr/share/elasticsearch -cp 
>> :/usr/share/elasticsearch/lib/elasticsearch-1.0.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
>> -Des.default.path.home=/usr/share/elasticsearch
>> -Des.default.path.logs=[...]
>> -Des.default.path.data=[...]
>> -Des.default.path.work=[...]
>> -Des.default.path.conf=/etc/elasticsearch 
>> org.elasticsearch.bootstrap.Elasticsearch
>> ```
>>
>> ## Key elasticsearch.yml settings
>>
>> * threadpool.bulk.type: fixed
>> * threadpool.bulk.queue_size: 1000
>> * indices.memory.index_buffer_size: 30%
>> * index.translog.flush_threshold_ops: 5
>> * indices.fielddata.cache.size: 30%
>>
>>
>> ### Search Load (Cluster 1)
>>
>> * Mainly Kibana3 (queries ES with daily alias that expands to 24 hourly 
>> indices)
>> * Jenkins jobs that constantly run and do many faceting/aggregations for 
>> the last hour's of data
>>
>> ### Things we've tried (unsuccesfully)
>>
>> * GC settings
>>   * young/old ratio
>> * Set young/old ration to 50/50 hoping that things would get GCed 
>> before having the

Re: JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak?


Please post all updates here (has pictures and better 
formatting): 
https://groups.google.com/forum/?fromgroups=#!topic/elasticsearch/VxkosQuKzaA

Thanks Adrien, I've cross-posted your reply in the other post.

On Monday, October 20, 2014 3:57:56 PM UTC-4, Adrien Grand wrote:
>
> Hi Gavin,
>
> You might be hit by the following Guava bug: 
> https://github.com/elasticsearch/elasticsearch/issues/6268. It was fixed 
> in Elasticsearch 1.1.3/1.2.1/1.3.0
>
> On Mon, Oct 20, 2014 at 3:27 PM, Gavin Seng  > wrote:
>
>>   
>> 
>> 
>> ### JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak?
>>  
>>   
>> Hi,   
>> 
>>  
>> We're seeing issues where GC collects less and less memory over time 
>> leading to the need to restart our nodes.   
>>  
>> The following is our setup and what we've tried. Please tell me if 
>> anything is lacking and I'll be glad to provide more details.   
>>  
>> Also appreciate any advice on how we can improve our configurations.
>>
>> Thank you for any help!
>>
>> Gavin
>>  
>>  
>> ### Cluster Setup 
>> 
>>  
>> * Tribes that link to 2 clusters 
>> 
>>   
>> * Cluster 1   
>> 
>>  
>>   * 3 masters (vms, master=true, data=false) 
>> 
>>   
>>   * 2 hot nodes (physical, master=false, data=true)   
>> 
>>  
>> * 2 hourly indices (1 for syslog, 1 for application logs) 
>> 
>>  
>> * 1 replica   
>> 
>>  
>> * Each index ~ 2 million docs (6gb - excl. of replica)   
>> 
>>   
>> * Rolled to cold nodes after 48 hrs   
>> 
>>  
>>   * 2 cold nodes (physical, master=false, data=true) 
>> 
>>   
>> * Cluster 2   
>> 
>>  
>>   * 3 masters (vms, master=true, data=false) 
>> 
>>   
>>   * 2 hot nodes (physical, master=false, data=true)   
>> 
>>  
>> * 1 hourly index 
>> 
>>   
>> * 1 replica

Children aggregation (1.4.0.Beta1) Round-Robin result

2014-10-20 Thread Vlad Vlaskin

Dear ES group,
we've been using ES in production for a while and test eagerly all 
new-coming features such as cardinality and others.

We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 8 
nodes, EC2 r3.xlarge, ssd, lot ram etc.)
With data model of: 
*Parent*
{
  "key": "value"  
}

and a timeline with children, holding metrics:

*Child* (type "metrics")
{
 "day": "2014-10-20",
  "count: 10
}

We update metric documents and properly index them with script+upsert.
The problem is that the query below* yields in 2 different results in round 
robin way. *
E.g. first time you call it you receive the first number, a second after 
you receive the second and again back to the first, etc. 

{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"MY_FIELD": {
"terms": {
"field": "FIELD-XYZ" // parent term aggregation 
},
"aggs": {
"children": {
"children": {
"type": "metrics"// child aggregation of 
type "metrics"
},
"aggs": {
"requests": {
"sum": {
"field": "count" // target aggregation 
within child documents
} 
}
}
}
}
}
}
}

 Result A: 
"aggregations": {
  "MY_FIELD": {
 "doc_count_error_upper_bound": 0,
 "buckets": [
{
   "key": "xx",
   "doc_count": 283322,
   "children": {
  "doc_count": 3740372,
  "requests": {
 "value": *5801652297*
  }
   }
}
 ]
  }
   }

Result B:
"aggregations": {
  "MY_FIELD": {
 "doc_count_error_upper_bound": 0,
 "buckets": [
{
   "key": "xx",
   "doc_count": 302421,
   "children": {
  "doc_count": 1877361,
  "requests": {
 "value": *2965346170*
  }
   }
}
 ]
  }
   }

The problem is that switching A to B back and forth is pretty stable 
and reproducible. 
ES logs are clear. 

Could someone help towards some ideas here?

Thank you!

Vlad

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6c948f61-0dce-4a62-b6ce-22b6a83aeaca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak? (reposted with better formatting)


Thanks Adrien, my cache is exactly 32GB so I'm cautiously optimistic ... 
will try it out and report back!

>From Adrien Grand:
You might be hit by the following Guava bug: 
https://github.com/elasticsearch/elasticsearch/issues/6268. It was fixed in 
Elasticsearch 1.1.3/1.2.1/1.3.0


On Monday, October 20, 2014 11:42:34 AM UTC-4, Gavin Seng wrote:
>
>
> ### JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak?
>
> ** reposting because 1st one came out w/o images and all kinds of strange 
> spaces.
>
> Hi,
>
> We're seeing issues where GC collects less and less memory over time 
> leading to the need to restart our nodes.
>
> The following is our setup and what we've tried. Please tell me if 
> anything is lacking and I'll be glad to provide more details.
>
> Also appreciate any advice on how we can improve our configurations.
>
> ### 32 GB heap
>
> http://i.imgur.com/JNpWeTw.png
> 
>
>
> ### 65 GB heap
>
> http://i.imgur.com/qcLhC3M.png
> 
>
>
>
> ### 65 GB heap with changed young/old ratio
>
> http://i.imgur.com/Aa3fOMG.png
> 
>
>
> ### Cluster Setup
>
> * Tribes that link to 2 clusters
> * Cluster 1
>   * 3 masters (vms, master=true, data=false)
>   * 2 hot nodes (physical, master=false, data=true)
> * 2 hourly indices (1 for syslog, 1 for application logs)
> * 1 replica
> * Each index ~ 2 million docs (6gb - excl. of replica)
> * Rolled to cold nodes after 48 hrs
>   * 2 cold nodes (physical, master=false, data=true)
> * Cluster 2
>   * 3 masters (vms, master=true, data=false)
>   * 2 hot nodes (physical, master=false, data=true)
> * 1 hourly index
> * 1 replica
> * Each index ~ 8 million docs (20gb - excl. of replica)
> * Rolled to cold nodes after 48 hrs
>   * 2 cold nodes (physical, master=false, data=true)
>
> Interestingly, we're actually having problems on Cluster 1's hot nodes 
> even though it indexes less.
>
> It suggests that this is a problem with searching because Cluster 1 is 
> searched on a lot more.
>
> ### Machine settings (hot node)
>
> * java
>   * java version "1.7.0_11"
>   * Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
>   * Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
> * 128gb ram
> * 8 cores, 32 cpus
> * ssds (raid 0)
>
> ### JVM settings
>
> ```
> java
> -Xms96g -Xmx96g -Xss256k
> -Djava.awt.headless=true
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram 
> -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/elasticsearch/gc.log 
> -XX:+HeapDumpOnOutOfMemoryError
> -verbose:gc -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation 
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
> -Xloggc:[...]
> -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.local.only=[...]
> -Dcom.sun.management.jmxremote.ssl=[...] 
> -Dcom.sun.management.jmxremote.authenticate=[...]
> -Dcom.sun.management.jmxremote.port=[...]
> -Delasticsearch -Des.pidfile=[...]
> -Des.path.home=/usr/share/elasticsearch -cp 
> :/usr/share/elasticsearch/lib/elasticsearch-1.0.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
> -Des.default.path.home=/usr/share/elasticsearch
> -Des.default.path.logs=[...]
> -Des.default.path.data=[...]
> -Des.default.path.work=[...]
> -Des.default.path.conf=/etc/elasticsearch 
> org.elasticsearch.bootstrap.Elasticsearch
> ```
>
> ## Key elasticsearch.yml settings
>
> * threadpool.bulk.type: fixed
> * threadpool.bulk.queue_size: 1000
> * indices.memory.index_buffer_size: 30%
> * index.translog.flush_threshold_ops: 5
> * indices.fielddata.cache.size: 30%
>
>
> ### Search Load (Cluster 1)
>
> * Mainly Kibana3 (queries ES with daily alias that expands to 24 hourly 
> indices)
> * Jenkins jobs that constantly run and do many faceting/aggregations for 
> the last hour's of data
>
> ### Things we've tried (unsuccesfully)
>
> * GC settings
>   * young/old ratio
> * Set young/old ration to 50/50 hoping that things would get GCed 
> before having the chance to move to old.
> * The old grew at a slower rate but still things could not be 
> collected.
>   * survivor space ratio
> * Give survivor space a higher ratio of young
> * Increase number of generations to make it to old be 10 (up from 6)
>   * Lower cms occupancy ratio
> * Tried 60% hoping to kick GC earlier. GC kicked in earlier but still 
> could not collect.
> * Limit filter/field cache
>   * indices.fielddata.cache.size: 32GB
>   * indices.cache.filter.size: 4GB
> * Optimizing index to 1 segment on the 3rd hour
> * Limit JVM to 32 gb ram
>   * reference: 
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html
> * Limit JVM to 65 gb ram
>   * This fulfils the 'leave 50% to the os' principle.
> * Read 90.5/7 O

Re: Kibana - missing settings gear/cog

2014-10-20 Thread Clif Smith

I should also mention, more importantly, I'm unable to browse ANYTHING. 
 The only items displayed in Kibana are it's logo on the left and the 4 
remaining icons on the right.

Any help is greatly appreciated.

On Monday, October 20, 2014 3:15:16 PM UTC-5, Clif Smith wrote:
>
> I'm running kibana v3.0.1.  The "Settings" gear/cog is no longer displayed 
> for us.  I recently updated the indexes to look for from the default to "[
> logstash-31days-].MM.DD,[logstash-forever].MM.DD".  Refreshing 
> the browser, trying other browsers, etc. doesn't help.  There are no errors 
> and all indexes are present.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f806b7b6-ddf5-4b29-a3b9-0a5974001e92%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How to do a rolling downgrade of servers

2014-10-20 Thread David Montgomery

 Hi,

I have 3 ES servers that are m1.large.  I need to change the machines types 
and reduce to 1 node.

The indexing and the shards were left at there default values and I am 
using unicast.


So  if I want to go to one node can I just stop 2 ES severs?

And on the last reaming node, that is e.g. m1.large, can  just add another 
1.medium node then when done syncing then stop the first m1.large?

I should have 1 m1.medium then

Thanks 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c0b52869-5758-499e-b401-cbbb763f778d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: word delimiter

2014-10-20 Thread Nick Tackes

I resolved my own issue. 

#!/bin/sh
curl -XPUT 'http://localhost:9200/specialchars' -d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
},  
"analysis" : {
"filter" : {
"special_character_splitter" : {
"type" : "word_delimiter",
"split_on_numerics":false,
"type_table": ["+ => ALPHANUM", "- => ALPHANUM", "@ => 
ALPHANUM"]
}
},
"analyzer" : {
"schar_analyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["lowercase", "special_character_splitter"]
}
}
}
},
"mappings" : {
"specialchars" : {
"properties" : {
"msg" : {
"type" : "string",
"analyzer" : "schar_analyzer"
}
}
}
}
}'

curl -XPOST localhost:9200/specialchars/specialchars/1 -d '{"msg" : "HER2+ 
Breast Cancer"}'
curl -XPOST localhost:9200/specialchars/specialchars/2 -d '{"msg" : 
"Non-Small Cell Lung Cancer"}'
curl -XPOST localhost:9200/specialchars/specialchars/3 -d '{"msg" : 
"c.2573T>G NSCLC"}'

curl -XPOST localhost:9200/specialchars/_refresh

curl -XGET 'localhost:9200/specialchars/_analyze?field=msg&pretty=1' -d 
"HER2+ Breast Cancer"
#curl -XGET 'localhost:9200/specialchars/_analyze?field=msg&pretty=1' -d 
"Non-Small Cell Lung Cancer"
#curl -XGET 'localhost:9200/specialchars/_analyze?field=msg&pretty=1' -d 
"c.2573T>G NSCLC"

printf "HER2+\n"
curl -XGET localhost:9200/specialchars/specialchars/_search?pretty -d '{
"query" : {
"match" : {
"msg" : {
"query" : "HER2+"
   }
}
}
}'

printf "HER2-\n"
curl -XGET localhost:9200/specialchars/specialchars/_search?pretty -d '{
"query" : {
"match" : {
"msg" : {
"query" : "HER2-"
   }
}
}
}'

printf "HER2@\n"
curl -XGET localhost:9200/specialchars/specialchars/_search?pretty -d '{
"query" : {
"match" : {
"msg" : {
"query" : "HER2@"
   }
}
}
}'


curl -X DELETE localhost:9200/specialchars


On Friday, October 17, 2014 4:57:52 PM UTC-7, Nick Tackes wrote:
>
> Hello, I am experimenting with word_delimiter and have an example with a 
> special character that is indexed.  The character is in the type table for 
> the word delimiter.  analysis of the tokenization looks good, but when i 
> attempt to do a match query it doesnt seem to respect tokenization as 
> expected.  
> The example indexes 'HER2+ Breast Cancer'.  Tokenization is 'her2+', 
> 'breast', 'cancer', which is good.  searching for 'HER2\\+' results in a 
> hit, as well as 'HER2\\-'
>
> #!/bin/sh
> curl -XPUT 'http://localhost:9200/specialchars' -d '{
> "settings" : {
> "index" : {
> "number_of_shards" : 1,
> "number_of_replicas" : 1
> },  
> "analysis" : {
> "filter" : {
> "special_character_spliter" : {
> "type" : "word_delimiter",
> "split_on_numerics":false,
> "type_table": ["+ => ALPHA", "- => ALPHA"]
> }
> },
> "analyzer" : {
> "schar_analyzer" : {
> "type" : "custom",
> "tokenizer" : "whitespace",
> "filter" : ["lowercase", "special_character_spliter"]
> }
> }
> }
> },
> "mappings" : {
> "specialchars" : {
> "properties" : {
> "msg" : {
> "type" : "string",
> "analyzer" : "schar_analyzer"
> }
> }
> }
> }
> }'
>
> curl -XPOST localhost:9200/specialchars/1 -d '{"msg" : "HER2+ Breast 
> Cancer"}'
> curl -XPOST localhost:9200/specialchars/2 -d '{"msg" : "Non-Small Cell 
> Lung Cancer"}'
> curl -XPOST localhost:9200/specialchars/3 -d '{"msg" : "c.2573T>G NSCLC"}'
>
> curl -XPOST localhost:9200/specialchars/_refresh
>
> curl -XGET 'localhost:9200/specialchars/_analyze?field=msg&pretty=1' -d 
> "HER2+ Breast Cancer"
> #curl -XGET 'localhost:9200/specialchars/_analyze?field=msg&pretty=1' -d 
> "Non-Small Cell Lung Cancer"
> #curl -XGET 'localhost:9200/specialchars/_analyze?field=msg&pretty=1' -d 
> "c.2573T>G NSCLC"
>
> printf "HER2+\n"
> curl -XGET localhost:9200/specialchars/_search?pretty -d '{
> "query" : {
> "match" : {
> "msg" : {
> "query" : "HER2\\+"
>}
> }
> }
> }'
>
> printf "HER2-\n"
> curl -XGET localhost:9200/specialchars/_search?pretty -d '{
> "query" : {
> "match" : {
> "msg" : {
> "query" : "HER2\\-"
>}
> }
> }
> }'
>
> curl -X DELETE localhost:9200/spe

Access Elasticsearch term information through ES-Hadoop API

2014-10-20 Thread Aritra Chatterjee

Hi,

Is it possible to get access to terms information that is indexed in ES and 
run some kind of map-reduce on them using ES-Hadoop? For example, let's say 
I have a CSV data set that was indexed into ES, with one particular field 
that was setup to be analyzed using a snowball analyzer. So, can I read the 
terms that were generated due to the snowball analyzer using ES-Hadoop? 
Something similar to how the ES scripts give access to TF, TTF, etc.

Also, is it possible to get the generated ID for an ES document using the 
ES-Hadoop APIs? This way, I could be able to query a document's term vector 
by _id, and maybe do calculations on terms, one document at a time.

Regards,
Aritra

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/72d1ac2b-ddec-48ab-b31d-1eebc87a5d38%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Best practise: Searching complex docs with ElasticSearch

2014-10-20 Thread Neil Middleton

The parallel of a github issue is a good one.  There are top level elements
(title, body) etc, and comments nested under that.  Comment requests I see
are to find all issues with a certain string in a comment, ordered by
recency, but the item we want to show in the results is a link to the issue.

N

On Mon, Oct 20, 2014 at 9:52 PM, Nick Zadrozny 
wrote:

> Hey Neil,
>
> Sounds interesting. For these questions, I think it's helpful to consider
> the interface you're building for the user. What's the fundamental "thing"
> being shown in a list of search results?
>
> Nested documents can be convenient, but generally I think the modeling for
> this kind of scenario works best when you denormalize the data as much as
> possible. In that approach, you'd index the children as individual
> documents, and save the parent attributes onto them.
>
> Field Collapsing can help if you're matching against multiple Comments but
> are more interested in showing and sorting the parent Issues they belong
> to.
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/top-hits.html
>
>
>
> On Mon, Oct 20, 2014 at 2:53 PM, Neil Middleton  wrote:
>
>> I'm wanting to search a series of documents which have a nested object
>> nature. For instance a Github issue. I'm needing to ideally use the ES search
>> lite
>> 
>>  syntax
>>
>> I have some top level data (assignee, created_at etc) and some nested
>> items, think comments or commits etc.
>>
>> Dumping an entire document into ES makes it easily searchable but some
>> weird side effects come up, most notably around sorting on the nested
>> comments.
>>
>> What's the best practice for this kind of document search? Is it better
>> to split the comments into seperate documents with issue meta data
>> attached, or via each issue being a big dump in a document?
>>
>> Ideally, I'd like to be able to search for an document, and sort by one
>> of the attributes of the document, either at the top level, or nested
>> inside one of the comments.
>>
>> Any ideas?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/42a3fffb-97ee-43e1-be68-02cd44bd9a28%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Nick Zadrozny
>
> Cofounder, CEO
> One More Cloud
>
> websolr.com • bonsai.io
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/rumLatb020I/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAPTxa80VireENz2a9p2X246_0pq3Yn_Q7hZBb%3Ddh7NGuNe28LA%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
 - N

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMjEqJjOWF_QbXbW_KVKxfgo5eS6pHbWFT6xXuxTOGQEs8rEDw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Best practise: Searching complex docs with ElasticSearch

2014-10-20 Thread Nick Zadrozny

Hey Neil,

Sounds interesting. For these questions, I think it's helpful to consider
the interface you're building for the user. What's the fundamental "thing"
being shown in a list of search results?

Nested documents can be convenient, but generally I think the modeling for
this kind of scenario works best when you denormalize the data as much as
possible. In that approach, you'd index the children as individual
documents, and save the parent attributes onto them.

Field Collapsing can help if you're matching against multiple Comments but
are more interested in showing and sorting the parent Issues they belong
to.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/top-hits.html



On Mon, Oct 20, 2014 at 2:53 PM, Neil Middleton  wrote:

> I'm wanting to search a series of documents which have a nested object
> nature. For instance a Github issue. I'm needing to ideally use the ES search
> lite
> 
>  syntax
>
> I have some top level data (assignee, created_at etc) and some nested
> items, think comments or commits etc.
>
> Dumping an entire document into ES makes it easily searchable but some
> weird side effects come up, most notably around sorting on the nested
> comments.
>
> What's the best practice for this kind of document search? Is it better to
> split the comments into seperate documents with issue meta data attached,
> or via each issue being a big dump in a document?
>
> Ideally, I'd like to be able to search for an document, and sort by one of
> the attributes of the document, either at the top level, or nested inside
> one of the comments.
>
> Any ideas?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/42a3fffb-97ee-43e1-be68-02cd44bd9a28%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Nick Zadrozny

Cofounder, CEO
One More Cloud

websolr.com • bonsai.io

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPTxa80VireENz2a9p2X246_0pq3Yn_Q7hZBb%3Ddh7NGuNe28LA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

logstash: how to include/modify facility/priority

2014-10-20 Thread paulo bruck

Hi Folks

I m trying to insert/modify via logstash priority and facility.

Using debian wheezy + rsyslog + logstash 1.4.2 and elasticsearch 1.1.1.

part of my rsyslog to undersand that I wanna:

/etc/rsyslog.conf:
.
# auth
auth.=emerg -/var/log/auth/auth_emerg.log
auth.=alert -/var/log/auth/auth_alert.log
auth.=crit  -/var/log/auth/auth_crit.log
auth.=err   -/var/log/auth/auth_err.log
auth.=warning   -/var/log/auth/auth_warning.log
auth.=notice-/var/log/auth/auth_notice.log
auth.=info  -/var/log/auth/auth_info.log
auth.=debug -/var/log/auth/auth_debug.log

# authpriv
authpriv.=emerg -/var/log/authpriv/authpriv_emerg.log
authpriv.=alert -/var/log/authpriv/authpriv_alert.log
authpriv.=crit  -/var/log/authpriv/authpriv_crit.log
..


/etc/logstash/conf.d/syslog.conf
input { 
file {
path => "/var/log/auth/auth_*.log"
type => "syslog"
}
file {
path => "/var/log/authpriv/authpriv_*.log"
type => "syslog"
}
file {
path => "/var/log/cron/cron_*.log"
type => "syslog"

.

filter {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} 
%{SYSLOGHOST:syslog_hostname} 
%{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: 
%{GREEDYDATA:syslog_message}" }
}
date {
locale => "en"
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd 
HH:mm:ss", "ISO8601" ]
}
}

output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}


json of one os syslog entries:

{
  "_index": "logstash-2014.10.20",
  "_type": "syslog",
  "_id": "57KDKSXKSeCy9VFDr1Arlw",
  "_score": null,
  "_source": {
"message": "Oct 20 18:10:01 wheezy CRON[5576]: pam_unix(cron:session): 
session closed for user www-data",
"@version": "1",
"@timestamp": "2014-10-20T20:10:01.000Z",
"type": "syslog",
"host": "wheezy",
"path": "/var/log/authpriv/authpriv_info.log",
"tags": [
  "_grokparsefailure"
],
"syslog_timestamp": "Oct 20 18:10:01",
"syslog_hostname": "wheezy",
"syslog_program": "CRON",
"syslog_pid": "5576",
"syslog_message": "pam_unix(cron:session): session closed for user www-data"
  },
  "sort": [
1413835801000,
1413835801000
  ]
}

How can I include facility priority knowing that at PATH I already have this 
information?
explo: path => "/var/log/auth/auth_emerg.log, /var/log/auth/auth_crit.log..."

best regards

BTW is there a book or another doc to read ? I've been reading logstah.net/docs 
but it is not enough to me...80)






-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ad3fe421-0027-4986-99b4-a10b8ae1741b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kibana - missing settings gear/cog

2014-10-20 Thread Clif Smith

I'm running kibana v3.0.1.  The "Settings" gear/cog is no longer displayed 
for us.  I recently updated the indexes to look for from the default to "[
logstash-31days-].MM.DD,[logstash-forever].MM.DD".  Refreshing the 
browser, trying other browsers, etc. doesn't help.  There are no errors and 
all indexes are present.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/de8e407f-5641-43ae-8d27-cea2a62e02ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES error while creating index on Solaris

2014-10-20 Thread Abhinav Sonkar

Hi Jorg,

Is the LZF compression issue related to the original Jackson issue?

Abhinav

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b2408f4c-cd74-4dcf-b303-d6987408d7fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES error while creating index on Solaris

2014-10-20 Thread Abhinav Sonkar

Hi Clinton,

Thanks for your reply. Can you please help in performing the below steps? I 
don't completely understand the concepts of ES yet.

Abhinav

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4768b3f2-946e-4e99-88ba-15b1dc54213d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak?

Hi Gavin,

You might be hit by the following Guava bug:
https://github.com/elasticsearch/elasticsearch/issues/6268. It was fixed in
Elasticsearch 1.1.3/1.2.1/1.3.0

On Mon, Oct 20, 2014 at 3:27 PM, Gavin Seng  wrote:

>
>
>
> ### JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak?
>
>
> Hi,
>
>
> We're seeing issues where GC collects less and less memory over time
> leading to the need to restart our nodes.
>
> The following is our setup and what we've tried. Please tell me if
> anything is lacking and I'll be glad to provide more details.
>
> Also appreciate any advice on how we can improve our configurations.
>
> Thank you for any help!
>
> Gavin
>
>
> ### Cluster Setup
>
>
> * Tribes that link to 2 clusters
>
>
> * Cluster 1
>
>
>   * 3 masters (vms, master=true, data=false)
>
>
>   * 2 hot nodes (physical, master=false, data=true)
>
>
> * 2 hourly indices (1 for syslog, 1 for application logs)
>
>
> * 1 replica
>
>
> * Each index ~ 2 million docs (6gb - excl. of replica)
>
>
> * Rolled to cold nodes after 48 hrs
>
>
>   * 2 cold nodes (physical, master=false, data=true)
>
>
> * Cluster 2
>
>
>   * 3 masters (vms, master=true, data=false)
>
>
>   * 2 hot nodes (physical, master=false, data=true)
>
>
> * 1 hourly index
>
>
> * 1 replica
>
>
> * Each index ~ 8 million docs (20gb - excl. of replica)
>
>
> * Rolled to cold nodes after 48 hrs
>
>
>   * 2 cold nodes (physical, master=false, data=true)
>
>
> Interestingly, we're actually having problems on Cluster 1's hot nodes
> even though it indexes less.
>
> It suggests that this is a problem with searching because Cluster 1 is
> searched on a lot more.
>
>
>
>
> ### Machine settings (hot node)
>
>
>
> * java
>
>
>   * java version "1.7.0_11"
>   * Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
>
>
>   * Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
>
>
> * 128gb ram
>
>
> * 8 cores, 32 cpus
>
>
> * ssds (raid 0)
>
>
>
>
>
> ### JVM settings
>
>
>
>
>
> ```
>
>
> java
>
>
> -Xms96g -Xmx96g -Xss256k
>
>
> -Djava.awt.headless=true
>
>
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=75
>
>
> -XX:+UseCMSInitiatingOccupancyOnly
>
>
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram
> -XX:+PrintTenuringDistribution
>
> -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/elasticsearch/gc.log
> -XX:+HeapDumpOnOutOfMemoryError
>
> -verbose:gc -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
>
> -Xloggc:[...]
>
>
> -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.local.only=[...]
>
>
> -Dcom.sun.management.jmxremote.ssl=[...]
> -Dcom.sun.management.jmxremote.authenticate=[...]
>
>
> -Dcom.sun.management.jmxremote.port=[...]
>
>
> -Delasticsearch -Des.pidfile=[...]
>
>
> -Des.path.home=/usr/share/elasticsearch -cp
> :/usr/share/elasticsearch/lib/elasticsearch-1.0.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
>
> -Des.default.path.home=/usr/share/elasticsearch
>
>
> -Des.default.path.logs=[...]
>
>
> -Des.default.path.data=[...]
>
>
> -Des.default.path.work=[...]
>
>
> -Des.default.path.conf=/etc/elasticsearch
> org.elasticsearch.bootstrap.Elasticsearch
>
>
> ```
>
>
>
>
>
> ## Key elasticsearch.yml settings
>
>
>
> * threadpool.bulk.type: fixed
>
>
> * threadpool.bulk.queue_size: 1000
>
>
> * indices.memory.index_buffer_size: 30%
>
>
> * index.translog.flush_threshold_ops: 5
>
>
> * indices.fielddata.cache.size: 30%
>
>
>
>
> ### Search Load (Cluster 1)
>
>
>
> * Mainly Kibana3 (queries ES with daily alias that expands to 24 hourly
> indices)
>
> * Jenkins jobs that constantly run and do many faceting/aggregations for
> the last hour's of data
>
>
>
>
> ### Things we've tried (unsuccesfully)
>
>
>
>
>
> * GC settings
>
>
>   * young/old ratio
>
>
> * Set young/old ration to 50/50 hoping that things would get GCed
> before having the chance to move to old.
>
> * The old grew at a slower rate but still things could not be
> collected.
>
>   * survivor space ratio
>
>
> * Give survivor space a higher ratio of young
>
>
> * Increase number of generations to make it to old be 10 (up from 6)
>
>
>   * Lower cms occupancy ratio
>
>
> * Tried 60% hoping to kick GC earlier. GC kicked in earlier but still
> could not collect.
>
> * Limit filter/field cache
>
>
>   * indices.fielddata.cache.size: 32GB
>
>
>   * indices.cache.filter.size: 4GB
>
>
> * Optimizing index to 1 segment on the 3rd hour
>
>
> * Limit JVM to 32 gb ram
>
>
>   * reference:
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html
>
>
> * Limit JVM to 65 gb ram
>
>
>   * This fulfils the 'leave 50% to the os' principle.
>
>
> * Read 90.5/7 OOM errors-- memory leak or GC problems?
>
>
>   *
> https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/memory$20leak/elasticsearch/_Zve60xOh_E/N13tlXgkUAwJ
>
>
>   * But we're not u

Best practise: Searching complex docs with ElasticSearch

2014-10-20 Thread Neil Middleton



I'm wanting to search a series of documents which have a nested object 
nature. For instance a Github issue. I'm needing to ideally use the ES search 
lite 

 syntax

I have some top level data (assignee, created_at etc) and some nested 
items, think comments or commits etc.

Dumping an entire document into ES makes it easily searchable but some 
weird side effects come up, most notably around sorting on the nested 
comments.

What's the best practice for this kind of document search? Is it better to 
split the comments into seperate documents with issue meta data attached, 
or via each issue being a big dump in a document?

Ideally, I'd like to be able to search for an document, and sort by one of 
the attributes of the document, either at the top level, or nested inside 
one of the comments.

Any ideas?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/42a3fffb-97ee-43e1-be68-02cd44bd9a28%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Upper limits on indexes/shards in a cluster

example log line: [DEBUG][action.admin.indices.status] [Red Ronin] 
[*index*][1], node[t60FJtJ-Qk-dQNrxyg8faA], [R], s[STARTED]: failed to 
executed 
[org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@36239161] 
org.elasticsearch.transport.NodeDisconnectedException: 
[Shotgun][inet[/IP:9300]][indices/status/s] disconnected

When the cluster gets into this state, all requests hang waiting for... 
something to happen. Each individual node returns 200 when curled locally. 
A huge number of this above log line appear at the end of this process -- 
one for every single shard on the node, which is a huge vomit into my logs. 
As soon as a node is restarted the cluster "snaps back" and immediately 
fails outstanding requests and begins rebalancing. It even stops responding 
to bigdesk requests.

On Monday, October 20, 2014 11:34:36 AM UTC-4, David Ashby wrote:
>
> Hi,
>
> We've been using elasticsearch on AWS for our application for two 
> purposes: as a search engine for user-created documents, and as a cache for 
> activity feeds in our application. We made a decision early-on to treat 
> every customer's content as a distinct index, for full logical separation 
> of customer data. We have about three hundred indexes in our cluster, with 
> the default 5-shards/1-replica setup.
>
> Recently, we've had major problems with the cluster "locking up" to 
> requests and losing track of its nodes. We initially responded by 
> attempting to remove possible CPU and memory limits, and placed all nodes 
> in the same AWS placement group, to maximize inter-node bandwidth, all to 
> no avail. We eventually lost an entire production cluster, resulting in a 
> decision to split the indexes across two completely independent clusters, 
> each cluster taking half of the indexes, with application-level logic 
> determining where the indexes were.
>
> All that is to say: with our setup, are we running into an undocumented 
> *practical* limit on the number of indexes or shards in a cluster? It 
> ends up being around 3000 shards with our setup. Our logs show evidence of 
> nodes timing out their responses to massive shard status-checks, and it 
> gets *worse* the more nodes there are in the cluster. It's also stable 
> with only *two* nodes.
>
> Thanks,
> -David
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/97d50096-5fd9-40ff-a6a6-900571808c23%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: copy index

I admit there is something overcautious in the knapsack release to prevent
overwriting existing data. I will add a fix that will allow writing into an
empty index.

https://github.com/jprante/elasticsearch-knapsack/issues/57

Jörg

On Mon, Oct 20, 2014 at 6:47 PM,  wrote:

> By the way
> Es version 1.3.4
> Knapsack version built with 1.3.4
>
>
> Regards.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e69c6778-cbc5-4e56-bf71-9bac56b66942%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFxLmO84ei%3DHFWJDsPKdM_nYvMuKV-V917Xd2_t%2BiGtPw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Upper limits on indexes/shards in a cluster

Two nodes are not stable with regard to split brains.

All I can guess is that two nodes have a small volume of network traffic
and that you may have had network problems.

Without exact diagnostic messages it's hard to understand why nodes
disconnected. There are plenty of reasons. Networking is just one.

ES has no internal shard limits, except what is imposed by the
memory/CPU/network limits of the hardware (or VM). This does not mean you
can put an arbitrary number of shards or an arbitrary number of data volume
on a single machine. It all depends.

Jörg

On Mon, Oct 20, 2014 at 6:06 PM, David Ashby 
wrote:

> The unhealthy clusters were between four and five nodes. We switched to
> two two-node clusters and those have been stable.
>
> Bigdesk reports file descriptors, memory, and CPU all have plentiful
> headroom in all cases.
>
> On Monday, October 20, 2014 11:54:21 AM UTC-4, Jörg Prante wrote:
>>
>> How many nodes do you have in your cluster?
>>
>> Have you checked if your nodes run out of file descriptors or heap memory?
>>
>> Jörg
>>
>> On Mon, Oct 20, 2014 at 5:52 PM, David Ashby 
>> wrote:
>>
>>> I might also note: the size of these indexes varies wildly, some being
>>> just a few documents, some being thousands, more or less following the
>>> power law.
>>>
>>>
>>> On Monday, October 20, 2014 11:34:36 AM UTC-4, David Ashby wrote:

 Hi,

 We've been using elasticsearch on AWS for our application for two
 purposes: as a search engine for user-created documents, and as a cache for
 activity feeds in our application. We made a decision early-on to treat
 every customer's content as a distinct index, for full logical separation
 of customer data. We have about three hundred indexes in our cluster, with
 the default 5-shards/1-replica setup.

 Recently, we've had major problems with the cluster "locking up" to
 requests and losing track of its nodes. We initially responded by
 attempting to remove possible CPU and memory limits, and placed all nodes
 in the same AWS placement group, to maximize inter-node bandwidth, all to
 no avail. We eventually lost an entire production cluster, resulting in a
 decision to split the indexes across two completely independent clusters,
 each cluster taking half of the indexes, with application-level logic
 determining where the indexes were.

 All that is to say: with our setup, are we running into an undocumented
 *practical* limit on the number of indexes or shards in a cluster? It
 ends up being around 3000 shards with our setup. Our logs show evidence of
 nodes timing out their responses to massive shard status-checks, and it
 gets *worse* the more nodes there are in the cluster. It's also stable
 with only *two* nodes.

 Thanks,
 -David

>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/17720132-eb50-4d49-bae5-8970e39b79dc%
>>> 40googlegroups.com
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/dc2d873d-28ed-40c9-94d9-eb1da37d1caa%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF-NTSCLBUvPU_19pQYGY86U-4A1N74%2Bd0n2LEeB-JhpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Error with Elasticsearch

2014-10-20 Thread shriyansh jain

Hi all,

I have a setup of ELK stack, and 

I am experiencing following error with elasticsearch frequently, please 
suggest me how can I debug this.

http://pastebin.com/9j4q1UVd

http://pastebin.com/Wj1nXuTt

Please let me know if you need any other information.


Thanks,
Shriyansh

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3ce39fcb-8cfc-4041-bc22-d1a264d2f88b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: word delimiter

2014-10-20 Thread Nick Tackes

any thoughts on how I am constructing my search query?  I have tried 
escaping the special characters as well as passing the unescaped special 
chars.  I was hoping to stick with a match query although i tried query 
string query, and match phrase and term query and had found no solution 
there.

very appreciative of your thoughts.

Nick

On Friday, October 17, 2014 4:57:52 PM UTC-7, Nick Tackes wrote:
>
> Hello, I am experimenting with word_delimiter and have an example with a 
> special character that is indexed.  The character is in the type table for 
> the word delimiter.  analysis of the tokenization looks good, but when i 
> attempt to do a match query it doesnt seem to respect tokenization as 
> expected.  
> The example indexes 'HER2+ Breast Cancer'.  Tokenization is 'her2+', 
> 'breast', 'cancer', which is good.  searching for 'HER2\\+' results in a 
> hit, as well as 'HER2\\-'
>
> #!/bin/sh
> curl -XPUT 'http://localhost:9200/specialchars' -d '{
> "settings" : {
> "index" : {
> "number_of_shards" : 1,
> "number_of_replicas" : 1
> },  
> "analysis" : {
> "filter" : {
> "special_character_spliter" : {
> "type" : "word_delimiter",
> "split_on_numerics":false,
> "type_table": ["+ => ALPHA", "- => ALPHA"]
> }
> },
> "analyzer" : {
> "schar_analyzer" : {
> "type" : "custom",
> "tokenizer" : "whitespace",
> "filter" : ["lowercase", "special_character_spliter"]
> }
> }
> }
> },
> "mappings" : {
> "specialchars" : {
> "properties" : {
> "msg" : {
> "type" : "string",
> "analyzer" : "schar_analyzer"
> }
> }
> }
> }
> }'
>
> curl -XPOST localhost:9200/specialchars/1 -d '{"msg" : "HER2+ Breast 
> Cancer"}'
> curl -XPOST localhost:9200/specialchars/2 -d '{"msg" : "Non-Small Cell 
> Lung Cancer"}'
> curl -XPOST localhost:9200/specialchars/3 -d '{"msg" : "c.2573T>G NSCLC"}'
>
> curl -XPOST localhost:9200/specialchars/_refresh
>
> curl -XGET 'localhost:9200/specialchars/_analyze?field=msg&pretty=1' -d 
> "HER2+ Breast Cancer"
> #curl -XGET 'localhost:9200/specialchars/_analyze?field=msg&pretty=1' -d 
> "Non-Small Cell Lung Cancer"
> #curl -XGET 'localhost:9200/specialchars/_analyze?field=msg&pretty=1' -d 
> "c.2573T>G NSCLC"
>
> printf "HER2+\n"
> curl -XGET localhost:9200/specialchars/_search?pretty -d '{
> "query" : {
> "match" : {
> "msg" : {
> "query" : "HER2\\+"
>}
> }
> }
> }'
>
> printf "HER2-\n"
> curl -XGET localhost:9200/specialchars/_search?pretty -d '{
> "query" : {
> "match" : {
> "msg" : {
> "query" : "HER2\\-"
>}
> }
> }
> }'
>
> curl -X DELETE localhost:9200/specialchars
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/62d1f188-92e1-4ea4-b94b-b47c696db78f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: copy index

By the way
Es version 1.3.4
Knapsack version built with 1.3.4


Regards. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e69c6778-cbc5-4e56-bf71-9bac56b66942%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Write-Read question

2014-10-20 Thread Jack Park

Hi Veneeth,

In previous experiments, where I pre-load the index with a kind of typology
-- the index is a topic map --, I set the refresh flag, and still had
problems.

In doing this work, there are both GET and search queries going in.

The actual use case in question relates to a collection of JSON documents
which are first indexed, and then later updated by agents reading other
JSON documents which reference those indexed documents. Frequently, this
happens nearly simultaneously in this sense: the first document is indexed,
but it nearly immediately referenced (GET by id) by another document. When
you have many agents doing this work, the penalty is that an already
indexed document may not be returned to another agent if the timing is too
close.

My solution to preloading the index with a typology was simply to create
that typology as a collection of JSON documents which do not do any
self-referencing, and load those.

I don't have that option here. As suggested above, my experience with
setting the refresh flag is that it did not help in all cases. I suppose I
could try again.

What I seek is something reliable. That's because I will soon begin dealing
with more than 200k JSON documents, actually PubMed abstracts, and hundreds
of Carrot2 clusters which reference them.

Many thanks
Jack

On Mon, Oct 20, 2014 at 8:56 AM, vineeth mohan 
wrote:

> Hello Jack ,
>
> What do you mean by "fetching it back" , is it a GET or a search.
> GET is realtime and Search is near realtime.
> You can use the refresh flag while indexing to make sure search is also
> real time , but its expensive.
>
> Thanks
>  Vineeth
>
> On Mon, Oct 20, 2014 at 8:59 PM, Jack Park 
> wrote:
>
>> I observe, and expect a slight delay between writing an object to ES and
>> being able to fetch it back.  In ordinary circumstances, this is not an
>> issue.
>>
>> But, in the context of many agents processing information resources,
>> creating indexes for them, but also needing to refer to those indexes right
>> away, what are the solutions available which allow to keep each index
>> object (say, a JSONObject of the indexed document) in synch across many
>> agents?
>>
>> Many thanks in advance.
>> Jack
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAH6s0fxyFxBRVQUv2Xxn%2BGqsLL3y8Y7QkpntbgGVTLmn2dG7ew%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mk7kMXrJLujaiAJHMZWMfidUOQFWCd%3DzkPMLcOas0p3Q%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH6s0fxYJXo2VRmzvdSxyxfeGzhzhD7zpg3vO%3DczfgnvooMk1w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Search Plugin to intercept search response

2014-10-20 Thread Tomislav Poljak

Hi Jörg,
I understand and what you described is exactly what I did: deployed
plugin to both elasticsearch node/server (running single node cluster
atm) and to the client code classpath which uses TransportClient ->
both log INFO 'simple action plugin loaded' on start, yet I can't
execute simple action successfully from client using TransportClient
(without NPE).

So, I'm wondering if custom actions can be executed thru the
TransportClient at all or are custom TransportActions actually
'internal node actions' which can only be executed on node (which
expands query and reduces results from other nodes/shards on the
client node - inside client jvm).

I wonder because seems to me like its not even plugin loading related
issue, but actually a problem of 'internal' type of call execution ->
for example, I get the same NPE when executing typical search request
using SearchAction.INSTANCE 'internal approach':

SearchRequestBuilder builder = new SearchRequestBuilder(client);
SearchResponse response = client.execute(SearchAction.INSTANCE,
builder.request()).actionGet();

over TransportClient, while again this call works perfectly fine over
node client.

Do you have some setup where you consume custom actions using
TransportClient (not rest and not node type of client)?

Thanks,
Tomislav

2014-10-20 17:21 GMT+02:00 joergpra...@gmail.com :
> If you use TransportClient, implementing custom actions like SimpleAction
> gets a bit tricky: you must install the plugin both on the cluster on all
> nodes *and* on the TransportClient classpath.
>
> The reason is, the TransportClient initiates a proxy action to ask the
> cluster to execute the action for him. A NodeClient runs in a single JVM
> with a cluster node and does not need such a proxy action.
>
> Jörg
>
>
> On Mon, Oct 20, 2014 at 3:09 PM, Tomislav Poljak  wrote:
>>
>> Hi Jörg,
>> thanks for response!
>>
>> I use default 'elasticsearch' cluster name and 'ordinary' match_all (as
>> below)
>>
>> Client client = new TransportClient().addTransportAddress(new
>> InetSocketTransportAddress("localhost", 9300));
>> SearchResponse response = client.prepareSearch().execute().actionGet();
>>
>>  works fine (connects,executes and returns results). However, I can't
>> execute SimpleAction successfully over TransportClient (even with
>> cluster name set):
>>
>>
>> Client client = new
>> TransportClient(ImmutableSettings.settingsBuilder().put("cluster.name",
>> "elasticsearch")).addTransportAddress(new
>> InetSocketTransportAddress("localhost", 9300));
>>
>> SimpleRequestBuilder builder = new SimpleRequestBuilder(client);
>> SearchResponse response = client.execute(SimpleAction.INSTANCE,
>> builder.request()).actionGet();
>>
>> Both, elasticsearch server and client using TransportClient report
>> 'simple action' plugin loaded (in console INFO), but on execute I get
>> NPE  on the client's side with trace:
>>
>> org.elasticsearch.common.util.concurrent.UncategorizedExecutionException:
>> Failed execution
>> at
>> org.elasticsearch.action.support.AdapterActionFuture.rethrowExecutionException(AdapterActionFuture.java:90)
>> at
>> org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:50)
>> at
>> org.xbib.elasticsearch.action.simple.SimpleActionTest.testSimpleAction(SimpleActionTest.java:26)
>> ..
>>
>> Caused by: java.lang.NullPointerException
>> at
>> org.elasticsearch.action.search.SearchRequest.writeTo(SearchRequest.java:541)
>> at
>> org.xbib.elasticsearch.action.simple.SimpleRequest.writeTo(SimpleRequest.java:38)
>>
>> Am I missing or misunderstanding something?
>>
>> Tomislav
>>
>>
>>
>>
>>
>>
>> 2014-10-18 15:28 GMT+02:00 joergpra...@gmail.com :
>> > You must set up a cluster name in the settings for the TransportClient,
>> > otherwise connection requests will be rejected by the cluster.
>> >
>> > Also, a dynmaic "new TransportClient()" for client instantiation is
>> > discouraged. By doing this, you open up a new threadpool each time.
>> > Recommend is to use a singleton instantiation for the whole JVM, and a
>> > single close() call on the TransportClient instance when JVM exits.
>> >
>> > Jörg
>> >
>> > On Sat, Oct 18, 2014 at 12:55 PM, Tomislav Poljak 
>> > wrote:
>> >>
>> >> Hi,
>> >> if I understand correctly, seems it should be possible to use
>> >> SimpleRequest over TransportClient, is this correct?
>> >>
>> >> I couldn't get it to work using:
>> >>
>> >> Client client = new TransportClient().addTransportAddress(new
>> >> InetSocketTransportAddress("localhost", 9300));
>> >>
>> >> but when switched to node client, seems to work
>> >>
>> >> Node node = nodeBuilder().client(true).node();
>> >> Client client = node.client();
>> >>
>> >> I'm also interested in reducing results/aggs using some custom code,
>> >> but to have it executed on elasticsearch cluster not to transfer
>> >> results to client node and reduce it there, but I'm not sure if this
>> >> is possible with custom transport actions when using TransportClient.
>> >>
>> >> Any i

Re: copy index

Okay when I try that I get this error.
It's always at byte 48
Thanks in advance


Caused by: java.lang.IndexOutOfBoundsException: Readable byte limit 
exceeded: 48
 at 
org.elasticsearch.common.netty.buffer.AbstractChannelBuffer.readByte(AbstractChannelBuffer.java:236)
 at 
org.elasticsearch.transport.netty.ChannelBufferStreamInput.readByte(ChannelBufferStreamInput.java:132)
 at 
org.elasticsearch.common.io.stream.StreamInput.readVInt(StreamInput.java:141)
 at 
org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:272)
 at 
org.elasticsearch.common.io.stream.HandlesStreamInput.readString(HandlesStreamInput.java:61)
 at 
org.elasticsearch.common.io.stream.StreamInput.readStringArray(StreamInput.java:362)
 at 
org.elasticsearch.action.admin.cluster.state.ClusterStateRequest.readFrom(ClusterStateRequest.java:132)
 at 
org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:209)
 at 
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:109)
 at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
 at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
 at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
 at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
 at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at 
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
 at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
 at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
 at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
 at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
 at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
 at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
 at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
 at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at 
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 at 
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)


On Monday, October 20, 2014 4:35:36 PM UTC+1, Jörg Prante wrote:

> The recipe is something like this
>
> 1. install knapsack
>
> 2. create new index. Example
>
> curl -XPUT 'localhost:9200/newindex'
>
> 3. create new mappings
>
> curl -XPUT 'localhost:9200/newindex/newmapping/_mapping' -d '{ ... }'
>
> 4. copy data 
>
> curl -XPOST 
> 'localhost:9200/oldindex/oldmapping/_push?map=\{"oldindex/oldmapping":"newindex/newmapping"\}'
>  
>
> Jörg
>
> On Mon, Oct 20, 2014 at 5:26 PM, > wrote:
>
>>
>> So just to explain what I want: 
>>
>>
>>- I want to be able to "push" an existing index to another index 
>>which has new mappings
>>
>>
>> Is this possible? 
>>
>> Preferably it wouldn't go through an intermediate file-system file: that 
>> would be expensive and might not be enough disk available.
>>
>> Thanks.
>> On Monday, October 20, 2014 4:16:55 PM UTC+1, Jörg Prante wrote:
>>
>>> There is no more parameter "createIndex", the documentation is outdated 
>>> - sorry for the confusion.
>>>
>>> The "_push" action does not use files. There is no need to do that, this 
>>> would be very strange,
>>>
>>> Jörg
>>>
>>>
>>> On Mon, Oct 20, 2014 at 5:12 PM,  wrote:
>>>
 Jorg,

 Not sure what you mean. There is a flag: "createIndex=false" which 
 means : 

 if the index already exists d

Shard data And Node name relationship

2014-10-20 Thread Daniel Schonfeld

Hello,

Suppose I have a cluster of 3 nodes.  Each node when it goes down and then 
comes back up, returns to the cluster with the data it had before shutdown. 
 However, the node does not necessarily join the cluster with the same node 
name.

>From the perspective of reinitializing shards and rebalancing, does that 
change of name affect anything? i.e does the node tell the cluster on 
joining i am node XYZ or does it just say, I have shards a,b,c for index 
'my index'?  

In other words, would rebalance commence due to name change of a node or 
due to data and other needs of ES?

Thanks!

Daniel Schonfeld

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a9a6a0db-8726-444d-b9bc-b5addbd1909f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Upper limits on indexes/shards in a cluster

The unhealthy clusters were between four and five nodes. We switched to two 
two-node clusters and those have been stable.

Bigdesk reports file descriptors, memory, and CPU all have plentiful 
headroom in all cases.

On Monday, October 20, 2014 11:54:21 AM UTC-4, Jörg Prante wrote:
>
> How many nodes do you have in your cluster? 
>
> Have you checked if your nodes run out of file descriptors or heap memory?
>
> Jörg
>
> On Mon, Oct 20, 2014 at 5:52 PM, David Ashby  > wrote:
>
>> I might also note: the size of these indexes varies wildly, some being 
>> just a few documents, some being thousands, more or less following the 
>> power law.
>>
>>
>> On Monday, October 20, 2014 11:34:36 AM UTC-4, David Ashby wrote:
>>>
>>> Hi,
>>>
>>> We've been using elasticsearch on AWS for our application for two 
>>> purposes: as a search engine for user-created documents, and as a cache for 
>>> activity feeds in our application. We made a decision early-on to treat 
>>> every customer's content as a distinct index, for full logical separation 
>>> of customer data. We have about three hundred indexes in our cluster, with 
>>> the default 5-shards/1-replica setup.
>>>
>>> Recently, we've had major problems with the cluster "locking up" to 
>>> requests and losing track of its nodes. We initially responded by 
>>> attempting to remove possible CPU and memory limits, and placed all nodes 
>>> in the same AWS placement group, to maximize inter-node bandwidth, all to 
>>> no avail. We eventually lost an entire production cluster, resulting in a 
>>> decision to split the indexes across two completely independent clusters, 
>>> each cluster taking half of the indexes, with application-level logic 
>>> determining where the indexes were.
>>>
>>> All that is to say: with our setup, are we running into an undocumented 
>>> *practical* limit on the number of indexes or shards in a cluster? It 
>>> ends up being around 3000 shards with our setup. Our logs show evidence of 
>>> nodes timing out their responses to massive shard status-checks, and it 
>>> gets *worse* the more nodes there are in the cluster. It's also stable 
>>> with only *two* nodes.
>>>
>>> Thanks,
>>> -David
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/17720132-eb50-4d49-bae5-8970e39b79dc%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dc2d873d-28ed-40c9-94d9-eb1da37d1caa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Write-Read question

2014-10-20 Thread vineeth mohan

Hello Jack ,

What do you mean by "fetching it back" , is it a GET or a search.
GET is realtime and Search is near realtime.
You can use the refresh flag while indexing to make sure search is also
real time , but its expensive.

Thanks
 Vineeth

On Mon, Oct 20, 2014 at 8:59 PM, Jack Park  wrote:

> I observe, and expect a slight delay between writing an object to ES and
> being able to fetch it back.  In ordinary circumstances, this is not an
> issue.
>
> But, in the context of many agents processing information resources,
> creating indexes for them, but also needing to refer to those indexes right
> away, what are the solutions available which allow to keep each index
> object (say, a JSONObject of the indexed document) in synch across many
> agents?
>
> Many thanks in advance.
> Jack
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAH6s0fxyFxBRVQUv2Xxn%2BGqsLL3y8Y7QkpntbgGVTLmn2dG7ew%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mk7kMXrJLujaiAJHMZWMfidUOQFWCd%3DzkPMLcOas0p3Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Upper limits on indexes/shards in a cluster

How many nodes do you have in your cluster?

Have you checked if your nodes run out of file descriptors or heap memory?

Jörg

On Mon, Oct 20, 2014 at 5:52 PM, David Ashby 
wrote:

> I might also note: the size of these indexes varies wildly, some being
> just a few documents, some being thousands, more or less following the
> power law.
>
>
> On Monday, October 20, 2014 11:34:36 AM UTC-4, David Ashby wrote:
>>
>> Hi,
>>
>> We've been using elasticsearch on AWS for our application for two
>> purposes: as a search engine for user-created documents, and as a cache for
>> activity feeds in our application. We made a decision early-on to treat
>> every customer's content as a distinct index, for full logical separation
>> of customer data. We have about three hundred indexes in our cluster, with
>> the default 5-shards/1-replica setup.
>>
>> Recently, we've had major problems with the cluster "locking up" to
>> requests and losing track of its nodes. We initially responded by
>> attempting to remove possible CPU and memory limits, and placed all nodes
>> in the same AWS placement group, to maximize inter-node bandwidth, all to
>> no avail. We eventually lost an entire production cluster, resulting in a
>> decision to split the indexes across two completely independent clusters,
>> each cluster taking half of the indexes, with application-level logic
>> determining where the indexes were.
>>
>> All that is to say: with our setup, are we running into an undocumented
>> *practical* limit on the number of indexes or shards in a cluster? It
>> ends up being around 3000 shards with our setup. Our logs show evidence of
>> nodes timing out their responses to massive shard status-checks, and it
>> gets *worse* the more nodes there are in the cluster. It's also stable
>> with only *two* nodes.
>>
>> Thanks,
>> -David
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/17720132-eb50-4d49-bae5-8970e39b79dc%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGU40R%2BUfBuM11eQgKdN7NGNK663hsQ_VC%3Dg9OPgxZQgw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Upper limits on indexes/shards in a cluster

I might also note: the size of these indexes varies wildly, some being just 
a few documents, some being thousands, more or less following the power law.

On Monday, October 20, 2014 11:34:36 AM UTC-4, David Ashby wrote:
>
> Hi,
>
> We've been using elasticsearch on AWS for our application for two 
> purposes: as a search engine for user-created documents, and as a cache for 
> activity feeds in our application. We made a decision early-on to treat 
> every customer's content as a distinct index, for full logical separation 
> of customer data. We have about three hundred indexes in our cluster, with 
> the default 5-shards/1-replica setup.
>
> Recently, we've had major problems with the cluster "locking up" to 
> requests and losing track of its nodes. We initially responded by 
> attempting to remove possible CPU and memory limits, and placed all nodes 
> in the same AWS placement group, to maximize inter-node bandwidth, all to 
> no avail. We eventually lost an entire production cluster, resulting in a 
> decision to split the indexes across two completely independent clusters, 
> each cluster taking half of the indexes, with application-level logic 
> determining where the indexes were.
>
> All that is to say: with our setup, are we running into an undocumented 
> *practical* limit on the number of indexes or shards in a cluster? It 
> ends up being around 3000 shards with our setup. Our logs show evidence of 
> nodes timing out their responses to massive shard status-checks, and it 
> gets *worse* the more nodes there are in the cluster. It's also stable 
> with only *two* nodes.
>
> Thanks,
> -David
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/17720132-eb50-4d49-bae5-8970e39b79dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES error while creating index on Solaris

2014-10-20 Thread Clinton Gormley

I think this is fixed in v1.3.5 
with https://github.com/elasticsearch/elasticsearch/pull/7468

On Monday, 20 October 2014 17:08:55 UTC+2, Jörg Prante wrote:
>
> I have added a comment to 
> https://github.com/elasticsearch/elasticsearch/issues/6962
>
> Jörg
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0b05890d-cfa4-4fe9-8e28-568d5369f52c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak? (reposted with better formatting)


### JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak?

** reposting because 1st one came out w/o images and all kinds of strange 
spaces.

Hi,

We're seeing issues where GC collects less and less memory over time 
leading to the need to restart our nodes.

The following is our setup and what we've tried. Please tell me if anything 
is lacking and I'll be glad to provide more details.

Also appreciate any advice on how we can improve our configurations.

### 32 GB heap

http://i.imgur.com/JNpWeTw.png



### 65 GB heap

http://i.imgur.com/qcLhC3M.png




### 65 GB heap with changed young/old ratio

http://i.imgur.com/Aa3fOMG.png



### Cluster Setup

* Tribes that link to 2 clusters
* Cluster 1
  * 3 masters (vms, master=true, data=false)
  * 2 hot nodes (physical, master=false, data=true)
* 2 hourly indices (1 for syslog, 1 for application logs)
* 1 replica
* Each index ~ 2 million docs (6gb - excl. of replica)
* Rolled to cold nodes after 48 hrs
  * 2 cold nodes (physical, master=false, data=true)
* Cluster 2
  * 3 masters (vms, master=true, data=false)
  * 2 hot nodes (physical, master=false, data=true)
* 1 hourly index
* 1 replica
* Each index ~ 8 million docs (20gb - excl. of replica)
* Rolled to cold nodes after 48 hrs
  * 2 cold nodes (physical, master=false, data=true)

Interestingly, we're actually having problems on Cluster 1's hot nodes even 
though it indexes less.

It suggests that this is a problem with searching because Cluster 1 is 
searched on a lot more.

### Machine settings (hot node)

* java
  * java version "1.7.0_11"
  * Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
  * Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
* 128gb ram
* 8 cores, 32 cpus
* ssds (raid 0)

### JVM settings

```
java
-Xms96g -Xmx96g -Xss256k
-Djava.awt.headless=true
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram 
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/elasticsearch/gc.log 
-XX:+HeapDumpOnOutOfMemoryError
-verbose:gc -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
-Xloggc:[...]
-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.local.only=[...]
-Dcom.sun.management.jmxremote.ssl=[...] 
-Dcom.sun.management.jmxremote.authenticate=[...]
-Dcom.sun.management.jmxremote.port=[...]
-Delasticsearch -Des.pidfile=[...]
-Des.path.home=/usr/share/elasticsearch -cp 
:/usr/share/elasticsearch/lib/elasticsearch-1.0.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
-Des.default.path.home=/usr/share/elasticsearch
-Des.default.path.logs=[...]
-Des.default.path.data=[...]
-Des.default.path.work=[...]
-Des.default.path.conf=/etc/elasticsearch 
org.elasticsearch.bootstrap.Elasticsearch
```

## Key elasticsearch.yml settings

* threadpool.bulk.type: fixed
* threadpool.bulk.queue_size: 1000
* indices.memory.index_buffer_size: 30%
* index.translog.flush_threshold_ops: 5
* indices.fielddata.cache.size: 30%


### Search Load (Cluster 1)

* Mainly Kibana3 (queries ES with daily alias that expands to 24 hourly 
indices)
* Jenkins jobs that constantly run and do many faceting/aggregations for 
the last hour's of data

### Things we've tried (unsuccesfully)

* GC settings
  * young/old ratio
* Set young/old ration to 50/50 hoping that things would get GCed 
before having the chance to move to old.
* The old grew at a slower rate but still things could not be collected.
  * survivor space ratio
* Give survivor space a higher ratio of young
* Increase number of generations to make it to old be 10 (up from 6)
  * Lower cms occupancy ratio
* Tried 60% hoping to kick GC earlier. GC kicked in earlier but still 
could not collect.
* Limit filter/field cache
  * indices.fielddata.cache.size: 32GB
  * indices.cache.filter.size: 4GB
* Optimizing index to 1 segment on the 3rd hour
* Limit JVM to 32 gb ram
  * reference: 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html
* Limit JVM to 65 gb ram
  * This fulfils the 'leave 50% to the os' principle.
* Read 90.5/7 OOM errors-- memory leak or GC problems?
  * 
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/memory$20leak/elasticsearch/_Zve60xOh_E/N13tlXgkUAwJ
  * But we're not using term filters

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0473e6c0-72d3-43e5-bda7-03022d7bffac%40googlegroups.com.
For more options, visit http

Re: copy index

The recipe is something like this

1. install knapsack

2. create new index. Example

curl -XPUT 'localhost:9200/newindex'

3. create new mappings

curl -XPUT 'localhost:9200/newindex/newmapping/_mapping' -d '{ ... }'

4. copy data

curl -XPOST
'localhost:9200/oldindex/oldmapping/_push?map=\{"oldindex/oldmapping":"newindex/newmapping"\}'

Jörg

On Mon, Oct 20, 2014 at 5:26 PM,  wrote:

>
> So just to explain what I want:
>
>
>- I want to be able to "push" an existing index to another index which
>has new mappings
>
>
> Is this possible?
>
> Preferably it wouldn't go through an intermediate file-system file: that
> would be expensive and might not be enough disk available.
>
> Thanks.
> On Monday, October 20, 2014 4:16:55 PM UTC+1, Jörg Prante wrote:
>
>> There is no more parameter "createIndex", the documentation is outdated -
>> sorry for the confusion.
>>
>> The "_push" action does not use files. There is no need to do that, this
>> would be very strange,
>>
>> Jörg
>>
>>
>> On Mon, Oct 20, 2014 at 5:12 PM,  wrote:
>>
>>> Jorg,
>>>
>>> Not sure what you mean. There is a flag: "createIndex=false" which means
>>> :
>>>
>>> if the index already exists do not try to create it ie it is pre-created.
>>>
>>> Import will handle this. Will _push also ?
>>>
>>> I have another question which affects me:
>>> I was hoping that "_push" would write to the index without using an
>>> intermediate file. But it seems behind the scenes it uses the filesystem
>>> like export/import. Can you confirm?
>>>
>>> Regards,
>>>
>>> On Sunday, October 19, 2014 9:14:57 PM UTC+1, Jörg Prante wrote:
>>>
 I never thought about something like "pre-creation"  because it would
 just double the existing create index action...


>>>
 Jörg

 On Sun, Oct 19, 2014 at 6:00 PM,  wrote:

> OK I can try that
> But is there an option in the _push to have a pre created index?
>
> I know it's possible with import createIndex=false
>
> Would export/import be just as good?
>
> --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearc...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/627108aa-8970-474d-bf5d-7aa3c3c4be73%40goo
> glegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/430ea089-1a58-4855-a201-19c4281073fd%
>>> 40googlegroups.com
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/87f3057e-7c89-417f-900a-3e6b2f10ffd6%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE1a7KWm5BHGAGn3BbshKwKYL7RLzTV7unjJ%3D4RnknK%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Upper limits on indexes/shards in a cluster

Hi,

We've been using elasticsearch on AWS for our application for two purposes:
as a search engine for user-created documents, and as a cache for activity
feeds in our application. We made a decision early-on to treat every
customer's content as a distinct index, for full logical separation of
customer data. We have about three hundred indexes in our cluster, with the
default 5-shards/1-replica setup.

Recently, we've had major problems with the cluster "locking up" to
requests and losing track of its nodes. We initially responded by
attempting to remove possible CPU and memory limits, and placed all nodes
in the same AWS placement group, to maximize inter-node bandwidth, all to
no avail. We eventually lost an entire production cluster, resulting in a
decision to split the indexes across two completely independent clusters,
each cluster taking half of the indexes, with application-level logic
determining where the indexes were.

All that is to say: with our setup, are we running into an undocumented
*practical* limit on the number of indexes or shards in a cluster? It ends
up being around 3000 shards with our setup. Our logs show evidence of nodes
timing out their responses to massive shard status-checks, and it gets
*worse* the more nodes there are in the cluster. It's also stable with only
*two* nodes.

Thanks,
-David

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6f5a8705-620f-4a41-8648-632c675d0291%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Write-Read question

2014-10-20 Thread Jack Park

I observe, and expect a slight delay between writing an object to ES and
being able to fetch it back.  In ordinary circumstances, this is not an
issue.

But, in the context of many agents processing information resources,
creating indexes for them, but also needing to refer to those indexes right
away, what are the solutions available which allow to keep each index
object (say, a JSONObject of the indexed document) in synch across many
agents?

Many thanks in advance.
Jack

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH6s0fxyFxBRVQUv2Xxn%2BGqsLL3y8Y7QkpntbgGVTLmn2dG7ew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: copy index


So just to explain what I want: 


   - I want to be able to "push" an existing index to another index which 
   has new mappings


Is this possible? 

Preferably it wouldn't go through an intermediate file-system file: that 
would be expensive and might not be enough disk available.

Thanks.
On Monday, October 20, 2014 4:16:55 PM UTC+1, Jörg Prante wrote:

> There is no more parameter "createIndex", the documentation is outdated - 
> sorry for the confusion.
>
> The "_push" action does not use files. There is no need to do that, this 
> would be very strange,
>
> Jörg
>
>
> On Mon, Oct 20, 2014 at 5:12 PM, > wrote:
>
>> Jorg,
>>
>> Not sure what you mean. There is a flag: "createIndex=false" which means 
>> : 
>>
>> if the index already exists do not try to create it ie it is pre-created.
>>
>> Import will handle this. Will _push also ?
>>
>> I have another question which affects me: 
>> I was hoping that "_push" would write to the index without using an 
>> intermediate file. But it seems behind the scenes it uses the filesystem 
>> like export/import. Can you confirm?
>>
>> Regards,
>>
>> On Sunday, October 19, 2014 9:14:57 PM UTC+1, Jörg Prante wrote:
>>
>>> I never thought about something like "pre-creation"  because it would 
>>> just double the existing create index action...
>>>
>>>  
>>
>>> Jörg
>>>
>>> On Sun, Oct 19, 2014 at 6:00 PM,  wrote:
>>>
 OK I can try that
 But is there an option in the _push to have a pre created index?

 I know it's possible with import createIndex=false

 Would export/import be just as good?

 --
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/627108aa-8970-474d-bf5d-7aa3c3c4be73%
 40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/430ea089-1a58-4855-a201-19c4281073fd%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/87f3057e-7c89-417f-900a-3e6b2f10ffd6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to find the thread pool size of an ElasticSearch cluster?

This is not the maximum number of requests you can send. It means "when
bulk indexing on a node gets too busy and must be queued, the maximum
number of actions that are allowed to queue up before a client is notified
of rejections is 50".

Jörg


On Mon, Oct 20, 2014 at 3:57 PM, truong ha  wrote:

> So in my case, which is the maximum requests I can send: *200 or 50*?
>
> On Monday, 20 October 2014 18:09:28 UTC+7, Jörg Prante wrote:
>>
>> bulk.queueSize is the maximum size before requests are rejected.
>>
>> Jörg
>>
>> On Mon, Oct 20, 2014 at 12:09 PM, truong ha  wrote:
>>
>>> I'm writing the concurrent code to send bulk index to ElasticSearch, and
>>> sending this query to get the thread pool size:
>>>
>>> GET /_cat/thread_pool?v&h=host,bulk.active,bulk.queueSize
>>>
>>> The response is
>>>
>>> hostbulk.active bulk.queueSize
>>> 1D4HPY1   0 50
>>> 1D4HPY2   0 50
>>> 1D4HPY3   0 50
>>> 1D4HPY4   0 50
>>>
>>> So how can I calculate the actual pool size of that cluster? Is it the
>>> sum of all hosts which means 200?
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/e6e48fe3-6269-493c-9258-fdc97baeb27e%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/29373a56-dbe7-4afb-9733-78f3fbbf5c2d%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF%3D0RY90k%2BS7-Y-RNhcjshdSptWZUoPQx7GC5r0AMXJkg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Search Plugin to intercept search response

If you use TransportClient, implementing custom actions like SimpleAction
gets a bit tricky: you must install the plugin both on the cluster on all
nodes *and* on the TransportClient classpath.

The reason is, the TransportClient initiates a proxy action to ask the
cluster to execute the action for him. A NodeClient runs in a single JVM
with a cluster node and does not need such a proxy action.

Jörg


On Mon, Oct 20, 2014 at 3:09 PM, Tomislav Poljak  wrote:

> Hi Jörg,
> thanks for response!
>
> I use default 'elasticsearch' cluster name and 'ordinary' match_all (as
> below)
>
> Client client = new TransportClient().addTransportAddress(new
> InetSocketTransportAddress("localhost", 9300));
> SearchResponse response = client.prepareSearch().execute().actionGet();
>
>  works fine (connects,executes and returns results). However, I can't
> execute SimpleAction successfully over TransportClient (even with
> cluster name set):
>
>
> Client client = new
> TransportClient(ImmutableSettings.settingsBuilder().put("cluster.name",
> "elasticsearch")).addTransportAddress(new
> InetSocketTransportAddress("localhost", 9300));
>
> SimpleRequestBuilder builder = new SimpleRequestBuilder(client);
> SearchResponse response = client.execute(SimpleAction.INSTANCE,
> builder.request()).actionGet();
>
> Both, elasticsearch server and client using TransportClient report
> 'simple action' plugin loaded (in console INFO), but on execute I get
> NPE  on the client's side with trace:
>
> org.elasticsearch.common.util.concurrent.UncategorizedExecutionException:
> Failed execution
> at
> org.elasticsearch.action.support.AdapterActionFuture.rethrowExecutionException(AdapterActionFuture.java:90)
> at
> org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:50)
> at
> org.xbib.elasticsearch.action.simple.SimpleActionTest.testSimpleAction(SimpleActionTest.java:26)
> ..
>
> Caused by: java.lang.NullPointerException
> at
> org.elasticsearch.action.search.SearchRequest.writeTo(SearchRequest.java:541)
> at
> org.xbib.elasticsearch.action.simple.SimpleRequest.writeTo(SimpleRequest.java:38)
>
> Am I missing or misunderstanding something?
>
> Tomislav
>
>
>
>
>
>
> 2014-10-18 15:28 GMT+02:00 joergpra...@gmail.com :
> > You must set up a cluster name in the settings for the TransportClient,
> > otherwise connection requests will be rejected by the cluster.
> >
> > Also, a dynmaic "new TransportClient()" for client instantiation is
> > discouraged. By doing this, you open up a new threadpool each time.
> > Recommend is to use a singleton instantiation for the whole JVM, and a
> > single close() call on the TransportClient instance when JVM exits.
> >
> > Jörg
> >
> > On Sat, Oct 18, 2014 at 12:55 PM, Tomislav Poljak 
> wrote:
> >>
> >> Hi,
> >> if I understand correctly, seems it should be possible to use
> >> SimpleRequest over TransportClient, is this correct?
> >>
> >> I couldn't get it to work using:
> >>
> >> Client client = new TransportClient().addTransportAddress(new
> >> InetSocketTransportAddress("localhost", 9300));
> >>
> >> but when switched to node client, seems to work
> >>
> >> Node node = nodeBuilder().client(true).node();
> >> Client client = node.client();
> >>
> >> I'm also interested in reducing results/aggs using some custom code,
> >> but to have it executed on elasticsearch cluster not to transfer
> >> results to client node and reduce it there, but I'm not sure if this
> >> is possible with custom transport actions when using TransportClient.
> >>
> >> Any info on this would be appreciated,
> >> Tomislav
> >>
> >>
> >> 2014-09-11 20:44 GMT+02:00 joergpra...@gmail.com  >:
> >> > Yes. I have checked in some code for a simple action plugin.
> >> >
> >> > https://github.com/jprante/elasticsearch-simple-action-plugin
> >> >
> >> > The plugin implements a simple "match_all" search action, by reusing
> >> > much of
> >> > the code of the search action.
> >> >
> >> > Best,
> >> >
> >> > Jörg
> >> >
> >> >
> >> >
> >> > On Thu, Sep 11, 2014 at 7:55 PM, Sandeep Ramesh Khanzode
> >> >  wrote:
> >> >>
> >> >> Thanks for bearing with me till now :) Please provide one final input
> >> >> on
> >> >> this issue.
> >> >>
> >> >> Is there any example for a custom search action? If not, can you
> please
> >> >> provide some details on how I can implement one?
> >> >>
> >> >> Thanks,
> >> >> Sandeep
> >> >>
> >> >>
> >> >> On Thu, Sep 11, 2014 at 4:53 PM, joergpra...@gmail.com
> >> >>  wrote:
> >> >>>
> >> >>> You can not intercept the SearchResponse on the ES server itself.
> >> >>> Instead, you must implement your custom search action.
> >> >>>
> >> >>> Jörg
> >> >>>
> >> >>> On Thu, Sep 11, 2014 at 10:00 AM, Sandeep Ramesh Khanzode
> >> >>>  wrote:
> >> 
> >>  When you say, 'receive the SearchResponse', is that in the ES
> Server
> >>  node or the TransportClient node that spawned the request? I would
> >>  want to
> >>  intercept the SearchResponse when created at the ES Server itself,

Re: copy index

There is no more parameter "createIndex", the documentation is outdated -
sorry for the confusion.

The "_push" action does not use files. There is no need to do that, this
would be very strange,

Jörg


On Mon, Oct 20, 2014 at 5:12 PM,  wrote:

> Jorg,
>
> Not sure what you mean. There is a flag: "createIndex=false" which means :
>
> if the index already exists do not try to create it ie it is pre-created.
>
> Import will handle this. Will _push also ?
>
> I have another question which affects me:
> I was hoping that "_push" would write to the index without using an
> intermediate file. But it seems behind the scenes it uses the filesystem
> like export/import. Can you confirm?
>
> Regards,
>
> On Sunday, October 19, 2014 9:14:57 PM UTC+1, Jörg Prante wrote:
>
>> I never thought about something like "pre-creation"  because it would
>> just double the existing create index action...
>>
>>
>
>> Jörg
>>
>> On Sun, Oct 19, 2014 at 6:00 PM,  wrote:
>>
>>> OK I can try that
>>> But is there an option in the _push to have a pre created index?
>>>
>>> I know it's possible with import createIndex=false
>>>
>>> Would export/import be just as good?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/627108aa-8970-474d-bf5d-7aa3c3c4be73%
>>> 40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/430ea089-1a58-4855-a201-19c4281073fd%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH92bkAf%3DWaQNfG4Nua2r24HkbX3TkBedXQ5fHz6z1zjA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: copy index

Jorg,

Not sure what you mean. There is a flag: "createIndex=false" which means : 

if the index already exists do not try to create it ie it is pre-created.

Import will handle this. Will _push also ?

I have another question which affects me: 
I was hoping that "_push" would write to the index without using an 
intermediate file. But it seems behind the scenes it uses the filesystem 
like export/import. Can you confirm?

Regards,

On Sunday, October 19, 2014 9:14:57 PM UTC+1, Jörg Prante wrote:

> I never thought about something like "pre-creation"  because it would just 
> double the existing create index action...
>
>  

> Jörg
>
> On Sun, Oct 19, 2014 at 6:00 PM, > wrote:
>
>> OK I can try that
>> But is there an option in the _push to have a pre created index?
>>
>> I know it's possible with import createIndex=false
>>
>> Would export/import be just as good?
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/627108aa-8970-474d-bf5d-7aa3c3c4be73%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/430ea089-1a58-4855-a201-19c4281073fd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES error while creating index on Solaris

I have added a comment to
https://github.com/elasticsearch/elasticsearch/issues/6962

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF3%3D8TU2QJtVWp_RqUdHxQkScQY1MDRYfQwbPrGhDFx7Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES error while creating index on Solaris

When trying to reproduce, I have no luck running ES 1.3.4 on Solaris Sparc.
Just like ES 1.2.2, it crashes with SIGBUS, but this time in lzf
compression codec.

So I will open an issue.

Jörg



#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0x7e51d838, pid=26623, tid=42
#
# JRE version: Java(TM) SE Runtime Environment (8.0_11-b12) (build
1.8.0_11-b12)

JavaThread "elasticsearch[Taj Nital][clusterService#updateTask][T#1]"
daemon [_thread_in_vm, id=42, stack(0xfffedfd0,0xfffedfd4)]

Stack: [0xfffedfd0,0xfffedfd4],  sp=0xfffedfd3e4f0,
 free space=249k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
V  [libjvm.so+0xd1d838]  Unsafe_GetInt+0x174
J 2115  sun.misc.Unsafe.getInt(Ljava/lang/Object;J)I (0 bytes) @
0x6c51c0e8 [0x6c51bfa0+0x148]
j
 
org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE._getInt([BI)I+10
j
 
org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE.tryCompress([BII[BI)I+104
j
 
org.elasticsearch.common.compress.lzf.ChunkEncoder.encodeChunk([BII)Lorg/elasticsearch/common/compress/lzf/LZFChunk;+17
j
 
org.elasticsearch.common.compress.lzf.LZFEncoder.encode(Lorg/elasticsearch/common/compress/lzf/ChunkEncoder;[BII)[B+17
j  org.elasticsearch.common.compress.lzf.LZFEncoder.encode([BII)[B+9
j  org.elasticsearch.common.compress.lzf.LZFCompressor.compress([BII)[B+3
j
 
org.elasticsearch.common.compress.CompressedString.(Ljava/lang/String;)V+29
j
 
org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(Lorg/elasticsearch/cluster/ClusterState;)Lorg/elasticsearch/cluster/ClusterState;+1536
j
 org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run()V+77
j
 
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run()V+4
j
 
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub
V  [libjvm.so+0x6fd670]  void
JavaCalls::call_helper(JavaValue*,methodHandle*,JavaCallArguments*,Thread*)+0xa58
V  [libjvm.so+0x6fba9c]  void
JavaCalls::call_virtual(JavaValue*,KlassHandle,Symbol*,Symbol*,JavaCallArguments*,Thread*)+0x370
V  [libjvm.so+0x6fbd60]  void
JavaCalls::call_virtual(JavaValue*,Handle,KlassHandle,Symbol*,Symbol*,Thread*)+0x50
V  [libjvm.so+0x825924]  void thread_entry(JavaThread*,Thread*)+0xdc
V  [libjvm.so+0xce4894]  void JavaThread::thread_main_inner()+0x94
V  [libjvm.so+0xce47e0]  void JavaThread::run()+0x408
V  [libjvm.so+0xb3abc4]  java_start+0x364


On Mon, Oct 20, 2014 at 2:00 PM, Clinton Gormley 
wrote:

> Hi Abhinav
>
> It would be good to know exactly where this problem is coming from. Is it
> the way that Logstash adds the template, or is it in the Elasticsearch
> layer.  Please could you try something:
>
> * Delete the existing template and index in Elasticsearch
> * Take the Logstash template and create it yourself, not using Logstash
> * Then index a document into an index that matches the template pattern
>
> If this works, then we know the problem is in the Logstash layer, rather
> than in the Elasticsearch layer
>
> thanks
>
> On 19 October 2014 16:38, Abhinav Sonkar  wrote:
>
>> Thanks Jorg.
>>
>> Some more information:
>>
>> The error message keeps repeating until ES is restarted. After restarting
>> ES, index gets created with some bulk shards message (I am not in VPN right
>> now so can't show the exact message). When the time comes to create new
>> logstash index, error starts coming again.
>>
>> Abhinav
>>
>>
>> On Sunday, 19 October 2014 16:31:07 UTC+2, Jörg Prante wrote:
>>>
>>> Sorry, I overlooked this - yes it is SPARC. I will investigate.
>>>
>>> Jörg
>>>
>>> On Sun, Oct 19, 2014 at 4:30 PM, joerg...@gmail.com 
>>> wrote:
>>>
 Is this Solaris SPARC?

 Looks like a compression / Jackson issue, SPARC is 64bit big-endian, so
 this is interesting.

 Jörg

 On Sun, Oct 19, 2014 at 12:25 PM, Abhinav Sonkar 
 wrote:

> Hello,
>
> I am using ES 1.3.4 with Logstash 1.4.2 on Solaris sparc. Everyday
> when Logstash tries to create a new index ES stops responding (process is
> active) with below error:
>
> [2014-10-19 12:23:02,264][DEBUG][action.admin.indices.create] [SZ1248
> Morpheus] [logstash-2014.10.19] failed to create
> org.elasticsearch.common.jackson.core.JsonParseException: Unexpected
> character ('_' (code 95)): was expecting a colon to separate field name 
> and
> value
>  at [Source: {"_default_":{"_defa:{"_defled":true},"dynamic_
> templates":[{"string_fields":[{"stch":"*",""stch":"*ping_
> fiel":"*","ng",""stch"ga:{"_def":"*","ng","index":"*","
> yzed":truit_norms":[{"s,"fields":[{"stc:{"_def":"*",""
> ga:{"_dex":"*","an,"yzed":truitre_abovf":"*",
> ropertie

Kibana histogram doesn't show data on some daily indices

2014-10-20 Thread Drew Gassaway

This is a weird one, but hopefully it will make more sense to someone else. 
 I have the following setup:

Happy 4-node ES cluster.  Data streaming via logstash, several daily 
rolling indices.  In particular I have one index for CPU/mem/disk data 
called "hardware-.MM.DD.  I have a Kibana panel that plots the disk 
usage % figure from this index for certain hosts over time.

The problem is, on seemingly random indices, Kibana displays nothing, or ~0 
values for a certain days index. The data drop-off always occurs at index 
rollover time, and querying any amount of time within the "bad" index gets 
no results (in kibana).  The data is the same, and the mappings for the 
good and bad indices are the same.  Not sure what my/Kibana's problem is.

Here's an image of a single host's disk over time as a simplified version 
of the issue.  This is over a 5d period.  The plotted value is the max 
value of a "usage" field, which is mapped as a long in ES in all indices. 
 You can see an overall trend of slowly increasing disk space (~30-36%). 
 The indices for 2014.10.15, 2014.10.17, and 2014.10.20 all are affected. 
 I have tried all kinds of panel settings and drilling down to a time range 
with a single data point on either side of a rollover of a good and bad 
index, and it doesn't make a lot of sense.













Actually, just going to go ahead and post a sample query for a very narrow 
window encompassing a few data points in good and bad indices (I removed 
some trailing '}' but this is from an inspect on the kibana panel):

http://~~~host~~~:9200/hardware-2014.10.19,hardware-2014.10.20/_search
{
  "facets": {
"83": {
  "date_histogram": {
"key_field": "@timestamp",
"value_field": "usage",
"interval": "5m"
  },
  "global": true,
  "facet_filter": {
"fquery": {
  "query": {
"filtered": {
  "query": {
"query_string": {
  "query": "log_type:hardware AND type:diskdata AND 
host:~host~"
}
  },
  "filter": {
"bool": {
  "must": [
{
  "range": {
"@timestamp": {
  "from": 1413762778069,
  "to": 1413763441563
}

  },
  "size": 0
}

The results for this are:
{"took":16,"timed_out":false,"_shards":{"total":10,"successful":10,"failed":0},"hits":{"total":91224,"max_score":0.0,"hits":[]},"facets":{"83":{"_type":"date_histogram","entries":[

{"time":141376260,"count":1,"min":35.0,"max":35.0,"total":35.0,"total_count":1,"mean":35.0},

{"time":141376290,"count":1,"min":35.0,"max":35.0,"total":35.0,"total_count":1,"mean":35.0},

{"time":141376320,"count":1,"min":1.73E-322,"max":1.73E-322,"total":1.73E-322,"total_count":1,"mean":1.73E-322}

]}}}

Note the last one - values very close to zero.  If I drill into the actual 
document in ES, though, there is no difference between the good and bad 
indices.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5da7fa35-949e-4625-b492-514b760377b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How does the date_histogram aggregation choose its buckets? Is this tunable?

2014-10-20 Thread Michael Herold

Hi Adrien,

Thanks! The fact that the buckets start calculating from the UNIX epoch is
what I didn't understand. The fact that it always landed on October 7th --
which seems like an arbitrary date -- confused me. I did some quick
calculations and you're right; midnight on October 7th, 2014, is 545
30-day-buckets from the UNIX epoch. Huzzah!

I think you're right about the pre_offset and post_offset. I should be able
to calculate the needed offset(s) to get the effect that I want.

Thank you for taking the time to explain this to me. I appreciate it!

Best,
Michael

On Sun, Oct 19, 2014 at 4:44 PM, Adrien Grand <
adrien.gr...@elasticsearch.com> wrote:

> Hi Michael,
>
> The thing is that buckets are not computed based on the current date and
> going backwards, but based on January 1st 1970 (called Epoch) which is a
> common origin of time for computers. So the first bucket would start on
> January 1st 1970, then the second on January 31st, ... and if you keep on
> doing it until October 2014, the bucket would start on the 7th (I think?).
>
> I believe you could make it work the way that you expect by using the
> pre_offset and post_offset options of the date histogram aggregation:
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html#_pre_post_offset
>
>
> On Fri, Oct 17, 2014 at 2:20 AM, Michael Herold <
> michael.j.her...@gmail.com> wrote:
>
>> Hi Adrien,
>>
>> Thank you for the reply. I actually want 30 day buckets, not one month
>> buckets, for the calculation I'm doing. I would understand the weird offset
>> if I was using months as a unit since they are of variable length. However,
>> a day is always 1000 * 60 * 60 * 24 milliseconds, so why would that cause
>> an offset that is the 7th of the month?
>>
>> Thank you,
>> Michael
>>
>> On Thursday, October 16, 2014 6:56:39 PM UTC-5, Adrien Grand wrote:
>>>
>>> Hi Michael,
>>>
>>> Histogram aggregations return buckets that are a multiple of the
>>> interval, you are getting this weird offset because not all months have
>>> exactly 30 days. Setting "interval" to "month" should fix the issue?
>>>
>>> On Thu, Oct 16, 2014 at 6:03 PM, Michael Herold 
>>> wrote:
>>>
 I'm trying to use elasticsearch to give me 30-day statistics for a
 given collection of models (pertinent fields are a date in *created_at*
 and an integer in *value*). Currently, I have this query/aggregation:

 {
   "query": {
 "match_all": {}
   },
   "aggregations": {
 "date_histogram": {
   "field": "created_at",
   "interval": "30d",
   "min_doc_count": 0,
   "extended_bounds": {
 "min": 138188160,  // Dynamically generated for 365 days
 ago (This is 2013-10-16 00:00:00 +)
 "max": 1413503999000   // Dynamically generated for end of
 today (This is 2014-10-16 23:59:59 +)
   }
 },
 "aggregations": {
   "stats": {
 "extended_stats": {
   "field": "value"
 }
   }
 }
   }
 }

 It's working as expected, except for one thing: the buckets don't line
 up as expected. For some reason, *the last bucket always starts on
 2014-10-07 00:00:00 +, regardless of what data is in elasticsearch*.
 I have tried this aggregation on a bunch of different date ranges,
 including:

 - 1 model instance per day for the past 30 days
 - 1 model instance per day for the past 365 days
 - 1 model instance total, for a *created_at* of 2014-09-30
 - 1 model instance total, for a *created_at* of 2014-10-15
 - 1 model instance total, for a *created_at* of 2014-10-16
 - 1 model instance total, for a *created_at* of 2014-10-31

 I have also tried to adjust the *extended bounds*, which doesn't shift
 the bucket dates at all.

 The result is that the last bucket is always giving a date of
 2014-10-07. This throws off the statistics because the last bucket isn't a
 full 30 days of material, whereas the rest of buckets are.

 *My questions:*

 *- Why are the buckets always pivoting around October 7th? *My
 expectation is that it pivots around 30 days prior to
 *extend_bounds["max"]*.
 *- Is there a way to tune this?*

 Thank you in advance for any help you can give.

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/87fe5659-50c5-4870-8139-12a680b94c9e%
 40googlegroups.com

 .
 For mor

how to filter range on cardinality aggregations ES v1.3.4 please ?

2014-10-20 Thread rmkml

Hello,

First, thx you for very good ELK !

Anyone help me for how to filter range on cardinality aggregations please ?

ok please look my query working example:

curl -XGET 
'http://localhost:9200/logstash-2014.10.20/_search?search_type=count&pretty=true'
 -d
 '{ "size":999, "aggs": { "distinct_ip_src": { "cardinality": { "field": 
"IP_SRC"'
response:
{
  "took" : 34,
  "timed_out" : false,
  "_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
  },
  "hits" : {
"total" : 12975,
"max_score" : 0.0,
"hits" : [ ]
  },
  "aggregations" : {
"distinct_ip_src" : {
  "value" : 10
}
  }
}


But now, how to filter like a range distinct_ip_src.value (10 here) please ?


query not working: (no hits reply)
curl -XGET 
'http://localhost:9200/logstash-2014.10.20/_search?search_type=count&pretty=true'
 -d
 '{ "size":999, "aggs": { "distinct_ip_src": { "cardinality": { "field": 
"IP_SRC"}}},"post_filter",{"query": 
{"range":{"distinct_ip_src.value":{"gt":9}'
Tryed only "value": not work
Tryed only "_value": not work
Tryed only "term": not work
size:0 : not better work
Tryed removing "search_type=count": not work

Best Regards
@Rmkml

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/75edbbd1-7136-445d-889a-3f4d13d02e84%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Using different mtu value in elasticsearch

2014-10-20 Thread P.A. M

Hi all,

I am setting up an ElasticSearch 1.3.4 cluster with multihomed servers (a
corporate network x.x.x.x and a server-only network at 10Gbs on z.z.z.z,
each server having two nic). I ran into some problems as I tried to use the
server-only network to do all the cluster traffic and the corporate network
to access the rest interface. I used :
- transport.host : z.z.z.z
- bind.host 0.0.0.0 (as I use the java transport client on the coporate
network).
- http.host: x.x.x.x

With this settings, I could create indices, connect with the java client,
manage with HQ/head plugin, etc. The problem is when I tried to pump data
into the cluster (using bulk insert with java client, worked ok on v0.92,
updated the jar version to 1.3.4), it started to fill the shard on the
server to which the client was directly connected, but when it tried to
fill shards on other servers... it just stop, or halted. The java client
waited with no error, nothing in the servers' log... I stopped it after 15
minutes and I still no log or feedback (I use the default log settings).

To test further, I started a cluster with only one server with an index
with 3 shards and pumped some data into it. All went well. Then I joined a
second server and checked on the head plugin. I could see the state of one
shard fliped to "relocating", but it didn't changed location... I waited to
half an hour (it contained 5 000 documents with only one field with one
word in it).

After banging my head a few times, I saw that the nic on the z.z.z.z
network was setup with mtu of 9000. I changed it to a more conventional
value of 1500 and then all went well...

So is elasticsearch restricted to mtu 1500 for the transport client? Is
there a way to change it? I saw nothing in the config files. I didn't
tested it with other in-between mtu values, so maybe it's just the bad
value for the network, but other network applications ran smoothly with mtu
9000.

Thanks in advance for your time,

P.A.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/65126ea6-950c-4ff8-a823-6abe76dad816%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: error: elasticsearch java : org.elasticsearch.ElasticsearchIllegalArgumentException: unknown property [content]

2014-10-20 Thread David Pilato

Did you set any mapping, any setting?

Actually, it would be better to give a full recreation so we can see 
immediately what is happening.

Have a look at http://www.elasticsearch.org/help/

-- 
David Pilato | Technical Advocate | elasticsearch.com
david.pil...@elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs



Le 20 octobre 2014 à 15:32:50, ALI BEN ABDALLAH (ali.benab...@gmail.com) a 
écrit:

hello, i need help please.
i'm trying to index an xml document after convert it to json but i have this 
error.
do you have any idea please?
Thanks in advance.

---
0    [main] INFO  org.elasticsearch.plugins  - [Base] loaded [], sites []
Exception in thread "main" 
org.elasticsearch.index.mapper.MapperParsingException: failed to parse 
[documents.document.body.lead.p.TC]
    at 
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:418)
    at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
    at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:594)
    at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:461)
    at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:549)
    at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
    at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:555)
    at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
    at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
    at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
    at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
    at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
    at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
    at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
    at 
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515)
    at 
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462)
    at 
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:384)
    at 
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:203)
    at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
    at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: unknown 
property [content]
    at 
org.elasticsearch.index.mapper.core.LongFieldMapper.innerParseCreateField(LongFieldMapper.java:287)
    at 
org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:215)
    at 
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:408)
    ... 22 more
-
 this's my json document:

-
{"documents":{"document":{"body":{"title":"Non 
Stop","duration":"0:30:0","timezone":"+0200","air.time":"11:30 (UTC 
+0200)","epochtime":1410946200,"timecode.unit":"sec","chan_id":"tv","lead":{"p":{"UTC":20140917093000,"offset":0,"TC":[20140917113000,{"content":"ministres","utcstarttime":20140917093001,"tcendtime":20140917113001,"offset":1,"tcstarttime":20140917113001,"utcendtime":20140917093001}]}},"air.startTime":20140917113000,"media":{"media-reference":{"content":"non-stop.mp4","nodoc":"VIDEO·20140917·TZB_V·20140917113000","format":"tv"},"media-type":"video"}},"doc.management":{"nodoc":{"pub.code":"TZB","date":20140917,"type":"TV"},"status":"DIFFUSE","initial.sequence":"20140917113000non-stop","language":"FRA"},"signaletic":{"publication":{"category":"Journal","name.norm":"TV","date.iso8601":"2014-09-17"}}},"schemaLocation":"http://google.com/"}}
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3e41a927-6982-400d-9c99-ab44c464bea1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this grou

Re: Indexed documents not showing up in search results

2014-10-20 Thread Pieter Agenbag

The elasticsearch log file was not showing any activity during the time 
when this was happening , but when I restarted the elasticsearch service - 
I got some warnings (see attached log file) , but then seem to proceed 
normally and has not complained any more . Note: My process is still 
indexing new documents.

 

On Monday, October 20, 2014 3:52:47 PM UTC+2, Pieter Agenbag wrote:
>
> Good afternoon. 
>
> The subject might be a little misleading as to the true nature of my 
> problem - which I'll try to explain here in as much detail as possible.
>
> First of all , I am rather new to Elasticsearch.
>
> Secondly , this problem has happened more than once (after dumping all 
> indexes and starting over).
>
> Ok ,here goes:
>
> I have a single elasticsearch node running on a dedicated Dell PowerEdge 
> R720 with dual 6 core cpus and 96GB ram. (of which 32GB is assigned to the 
> HEAP)
> The machine is connected back to back with a 10GB fiber to another Dell 
> R620.
>
> My processing happens on the R620 which then uses the bulk api to index 
> (currently testing at 8000 documents per second) into the ES on the R720.
> The documents are all of the same type and indexed into daily "partitions" 
> .. events-2014-10-18 , events-2014-10-19 , events-2014-10-20 etc. 
> I then have a Kibana dashboard on top of that.
>
> All of this works perfectly for several days and then seemingly stops. I 
> noticed that my Kibana dashboard (defaulting to the last hours data) 
> stopped plotting. Further investigation showed that my application is still 
> processing and indexing documents, 
> but a search on todays index , with descending timestamp ordering shows 
> the last document as 3:59 this morning.
>
> POST /events-2014-10-20/event/_search
> {
> "size":20
> ,"from": 0
> ,"query": 
> {
> "filtered": 
> {
> "query": 
> {
> "match_all": {}
> }
> }
> }
> ,"sort": 
> [
> {
>   "timestamp": 
>   {
>  "order": "desc"
>   }
>}
> ]
> }
>
>  result
> {
>"took": 22836,
>"timed_out": false,
>"_shards": {
>   "total": 5,
>   "successful": 5,
>   "failed": 0
>},
>"hits": {
>   "total": 385153635,
>   "max_score": null,
>   "hits": [
>  {
> "_index": "events-2014-10-20",
> "_type": "event",
> "_id": "HYOy53Q3TM-TwBz6SLRB5w",
> "_score": null,
> "_source": {
>"timestamp": "2014-10-20 03:59:14", 
> .. etc
>
> Now, when I do a bulk index , the API tells me that the documents were 
> created and specifies the autogenerated ids.
> So , taking one of those IDs and querying it directly , I DO get the 
> document back. 
> In the correct index , with the correct timestamp at 10:12 (from earlier 
> when I was investigating)
>
> GET /events-2014-10-20/event/kY6MaTgCThizVGgb3iSGrw
>
> {
>"_index": "events-2014-10-20",
>"_type": "event",
>"_id": "kY6MaTgCThizVGgb3iSGrw",
>"_version": 1,
>"found": true,
>"_source": {
>   "timestamp": "2014-10-20 10:12:35",
>
> 
> Some system stats :
> 
> The machine's cpu spikes every now and then (as I issue bulk indexes) , 
> but drops down to idle again afterwards. 
> There's plenty left on the heap and the disks is only 33% used.
>
>
> curl -XGET 'localhost:9200/_cat/health?v'
> epoch  timestamp cluster   status node.total node.data shards pri 
> relo init unassign 
> 1413812685 15:44:45  elasticsearch yellow  1 1 85  85 
>00   15 
>
> curl -XGET 'localhost:9200/_cat/count?v'
> epoch  timestamp count  
> 1413812740 15:45:40  4171812520
>
> curl -XGET '10.15.3.19:9200/_cat/indices/events*?v'
> health index pri rep docs.count docs.deleted store.size 
> pri.store.size 
> green  events-2014-10-19   5   0  7024226960230.5gb   
>  230.5gb 
> green  events-2014-10-17   5   0  7023413950230.4gb   
>  230.4gb 
> green  events-2014-10-18   5   0  7023529750230.3gb   
>  230.3gb 
> green  events-2014-10-15   5   0  7032296360230.8gb   
>  230.8gb 
> green  events-2014-10-16   5   0  7015479920230.2gb   
>  230.2gb 
> green  events-2014-10-14   5   0  2523949230 83.1gb   
>   83.1gb 
> yellow events-2014-10-20   5   1  4073310750127.9gb   
>  127.9gb 
>
> curl -XGET 'localhost:9200/_cat/pending_tasks?v'
> insertOrder timeInQueue priority source
>
> curl -XGET '10.15.3.19:9200/_cat/shards/events-2014-10-20?v'
> index shard prirep state  docs  store ip node  
> events-2014-10-20 4 p  STARTED81862132 27.8gb 10.15.3.19 X-Ray 
> events-2014-10-20 4 r  UNASSIGNED  
> events-2014-10-20 0 p  STARTED8186 25.4gb 10.15.3.19 X-Ray 
>

Re: How to find the thread pool size of an ElasticSearch cluster?

So in my case, which is the maximum requests I can send: *200 or 50*?

On Monday, 20 October 2014 18:09:28 UTC+7, Jörg Prante wrote:
>
> bulk.queueSize is the maximum size before requests are rejected.
>
> Jörg
>
> On Mon, Oct 20, 2014 at 12:09 PM, truong ha  > wrote:
>
>> I'm writing the concurrent code to send bulk index to ElasticSearch, and 
>> sending this query to get the thread pool size:
>>
>> GET /_cat/thread_pool?v&h=host,bulk.active,bulk.queueSize
>>
>> The response is
>>
>> hostbulk.active bulk.queueSize 
>> 1D4HPY1   0 50 
>> 1D4HPY2   0 50
>> 1D4HPY3   0 50 
>> 1D4HPY4   0 50
>>
>> So how can I calculate the actual pool size of that cluster? Is it the 
>> sum of all hosts which means 200?
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/e6e48fe3-6269-493c-9258-fdc97baeb27e%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/29373a56-dbe7-4afb-9733-78f3fbbf5c2d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Indexed documents not showing up in search results

2014-10-20 Thread Pieter Agenbag

Good afternoon. 

The subject might be a little misleading as to the true nature of my 
problem - which I'll try to explain here in as much detail as possible.

First of all , I am rather new to Elasticsearch.

Secondly , this problem has happened more than once (after dumping all 
indexes and starting over).

Ok ,here goes:

I have a single elasticsearch node running on a dedicated Dell PowerEdge 
R720 with dual 6 core cpus and 96GB ram. (of which 32GB is assigned to the 
HEAP)
The machine is connected back to back with a 10GB fiber to another Dell 
R620.

My processing happens on the R620 which then uses the bulk api to index 
(currently testing at 8000 documents per second) into the ES on the R720.
The documents are all of the same type and indexed into daily "partitions" 
.. events-2014-10-18 , events-2014-10-19 , events-2014-10-20 etc. 
I then have a Kibana dashboard on top of that.

All of this works perfectly for several days and then seemingly stops. I 
noticed that my Kibana dashboard (defaulting to the last hours data) 
stopped plotting. Further investigation showed that my application is still 
processing and indexing documents, 
but a search on todays index , with descending timestamp ordering shows the 
last document as 3:59 this morning.

POST /events-2014-10-20/event/_search
{
"size":20
,"from": 0
,"query": 
{
"filtered": 
{
"query": 
{
"match_all": {}
}
}
}
,"sort": 
[
{
  "timestamp": 
  {
 "order": "desc"
  }
   }
]
}

 result
{
   "took": 22836,
   "timed_out": false,
   "_shards": {
  "total": 5,
  "successful": 5,
  "failed": 0
   },
   "hits": {
  "total": 385153635,
  "max_score": null,
  "hits": [
 {
"_index": "events-2014-10-20",
"_type": "event",
"_id": "HYOy53Q3TM-TwBz6SLRB5w",
"_score": null,
"_source": {
   "timestamp": "2014-10-20 03:59:14", 
.. etc

Now, when I do a bulk index , the API tells me that the documents were 
created and specifies the autogenerated ids.
So , taking one of those IDs and querying it directly , I DO get the 
document back. 
In the correct index , with the correct timestamp at 10:12 (from earlier 
when I was investigating)

GET /events-2014-10-20/event/kY6MaTgCThizVGgb3iSGrw

{
   "_index": "events-2014-10-20",
   "_type": "event",
   "_id": "kY6MaTgCThizVGgb3iSGrw",
   "_version": 1,
   "found": true,
   "_source": {
  "timestamp": "2014-10-20 10:12:35",


Some system stats :

The machine's cpu spikes every now and then (as I issue bulk indexes) , but 
drops down to idle again afterwards. 
There's plenty left on the heap and the disks is only 33% used.


curl -XGET 'localhost:9200/_cat/health?v'
epoch  timestamp cluster   status node.total node.data shards pri 
relo init unassign 
1413812685 15:44:45  elasticsearch yellow  1 1 85  85   
 00   15 

curl -XGET 'localhost:9200/_cat/count?v'
epoch  timestamp count  
1413812740 15:45:40  4171812520

curl -XGET '10.15.3.19:9200/_cat/indices/events*?v'
health index pri rep docs.count docs.deleted store.size 
pri.store.size 
green  events-2014-10-19   5   0  7024226960230.5gb   
 230.5gb 
green  events-2014-10-17   5   0  7023413950230.4gb   
 230.4gb 
green  events-2014-10-18   5   0  7023529750230.3gb   
 230.3gb 
green  events-2014-10-15   5   0  7032296360230.8gb   
 230.8gb 
green  events-2014-10-16   5   0  7015479920230.2gb   
 230.2gb 
green  events-2014-10-14   5   0  2523949230 83.1gb 
83.1gb 
yellow events-2014-10-20   5   1  4073310750127.9gb   
 127.9gb 

curl -XGET 'localhost:9200/_cat/pending_tasks?v'
insertOrder timeInQueue priority source

curl -XGET '10.15.3.19:9200/_cat/shards/events-2014-10-20?v'
index shard prirep state  docs  store ip node  
events-2014-10-20 4 p  STARTED81862132 27.8gb 10.15.3.19 X-Ray 
events-2014-10-20 4 r  UNASSIGNED  
events-2014-10-20 0 p  STARTED8186 25.4gb 10.15.3.19 X-Ray 
events-2014-10-20 0 r  UNASSIGNED  
events-2014-10-20 3 p  STARTED81885822 25.5gb 10.15.3.19 X-Ray 
events-2014-10-20 3 r  UNASSIGNED  
events-2014-10-20 1 p  STARTED81868103 25.5gb 10.15.3.19 X-Ray 
events-2014-10-20 1 r  UNASSIGNED  
events-2014-10-20 2 p  STARTED81871297 26.5gb 10.15.3.19 X-Ray 
events-2014-10-20 2 r  UNASSIGNED 

I can obviously not put this system into production and my deadline is fast 
approaching - so , any help will be greatly

error: elasticsearch java : org.elasticsearch.ElasticsearchIllegalArgumentException: unknown property [content]

2014-10-20 Thread ALI BEN ABDALLAH

hello, i need help please.
i'm trying to index an xml document after convert it to json but i have 
this error.
do you have any idea please?
Thanks in advance.

---
0[main] INFO  org.elasticsearch.plugins  - [Base] loaded [], sites []
Exception in thread "main" 
org.elasticsearch.index.mapper.MapperParsingException: failed to parse 
[documents.document.body.lead.p.TC]
at 
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:418)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:594)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:461)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:549)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:555)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
at 
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
at 
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515)
at 
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462)
at 
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:384)
at 
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:203)
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: unknown 
property [content]
at 
org.elasticsearch.index.mapper.core.LongFieldMapper.innerParseCreateField(LongFieldMapper.java:287)
at 
org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:215)
at 
org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:408)
... 22 more
-
 this's my json document:

-
{"documents":{"document":{"body":{"title":"Non 
Stop","duration":"0:30:0","timezone":"+0200","air.time":"11:30 (UTC 
+0200)","epochtime":1410946200,"timecode.unit":"sec","chan_id":"tv","lead":{"p":{"UTC":20140917093000,"offset":0,"TC":[20140917113000,{"content":"ministres","utcstarttime":20140917093001,"tcendtime":20140917113001,"offset":1,"tcstarttime":20140917113001,"utcendtime":20140917093001}]}},"air.startTime":20140917113000,"media":{"media-reference":{"content":"non-stop.mp4","nodoc":"VIDEO·20140917·TZB_V·20140917113000","format":"tv"},"media-type":"video"}},"doc.management":{"nodoc":{"pub.code":"TZB","date":20140917,"type":"TV"},"status":"DIFFUSE","initial.sequence":"20140917113000non-stop","language":"FRA"},"signaletic":{"publication":{"category":"Journal","name.norm":"TV","date.iso8601":"2014-09-17"}}},"schemaLocation":"http://google.com/"}}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3e41a927-6982-400d-9c99-ab44c464bea1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak?



  
### JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak?
   

Hi, 

   
We're seeing issues where GC collects less and less memory over time 
leading to the need to restart our nodes.   
 
The following is our setup and what we've tried. Please tell me if anything 
is lacking and I'll be glad to provide more details.   
 
Also appreciate any advice on how we can improve our configurations.

Thank you for any help!

Gavin
   
   
### Cluster Setup   

   
* Tribes that link to 2 clusters   


* Cluster 1 

   
  * 3 masters (vms, master=true, data=false)   


  * 2 hot nodes (physical, master=false, data=true) 

   
* 2 hourly indices (1 for syslog, 1 for application logs)   

   
* 1 replica 

   
* Each index ~ 2 million docs (6gb - excl. of replica) 


* Rolled to cold nodes after 48 hrs 

   
  * 2 cold nodes (physical, master=false, data=true)   


* Cluster 2 

   
  * 3 masters (vms, master=true, data=false)   


  * 2 hot nodes (physical, master=false, data=true) 

   
* 1 hourly index   


* 1 replica 

   
* Each index ~ 8 million docs (20gb - excl. of replica) 

   
* Rolled to cold nodes after 48 hrs 

   
  * 2 cold nodes (physical, master=false, data=true)

Re: Search Plugin to intercept search response

2014-10-20 Thread Tomislav Poljak

Hi Jörg,
thanks for response!

I use default 'elasticsearch' cluster name and 'ordinary' match_all (as below)

Client client = new TransportClient().addTransportAddress(new
InetSocketTransportAddress("localhost", 9300));
SearchResponse response = client.prepareSearch().execute().actionGet();

 works fine (connects,executes and returns results). However, I can't
execute SimpleAction successfully over TransportClient (even with
cluster name set):


Client client = new
TransportClient(ImmutableSettings.settingsBuilder().put("cluster.name",
"elasticsearch")).addTransportAddress(new
InetSocketTransportAddress("localhost", 9300));

SimpleRequestBuilder builder = new SimpleRequestBuilder(client);
SearchResponse response = client.execute(SimpleAction.INSTANCE,
builder.request()).actionGet();

Both, elasticsearch server and client using TransportClient report
'simple action' plugin loaded (in console INFO), but on execute I get
NPE  on the client's side with trace:

org.elasticsearch.common.util.concurrent.UncategorizedExecutionException:
Failed execution
at 
org.elasticsearch.action.support.AdapterActionFuture.rethrowExecutionException(AdapterActionFuture.java:90)
at 
org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:50)
at 
org.xbib.elasticsearch.action.simple.SimpleActionTest.testSimpleAction(SimpleActionTest.java:26)
..

Caused by: java.lang.NullPointerException
at org.elasticsearch.action.search.SearchRequest.writeTo(SearchRequest.java:541)
at 
org.xbib.elasticsearch.action.simple.SimpleRequest.writeTo(SimpleRequest.java:38)

Am I missing or misunderstanding something?

Tomislav






2014-10-18 15:28 GMT+02:00 joergpra...@gmail.com :
> You must set up a cluster name in the settings for the TransportClient,
> otherwise connection requests will be rejected by the cluster.
>
> Also, a dynmaic "new TransportClient()" for client instantiation is
> discouraged. By doing this, you open up a new threadpool each time.
> Recommend is to use a singleton instantiation for the whole JVM, and a
> single close() call on the TransportClient instance when JVM exits.
>
> Jörg
>
> On Sat, Oct 18, 2014 at 12:55 PM, Tomislav Poljak  wrote:
>>
>> Hi,
>> if I understand correctly, seems it should be possible to use
>> SimpleRequest over TransportClient, is this correct?
>>
>> I couldn't get it to work using:
>>
>> Client client = new TransportClient().addTransportAddress(new
>> InetSocketTransportAddress("localhost", 9300));
>>
>> but when switched to node client, seems to work
>>
>> Node node = nodeBuilder().client(true).node();
>> Client client = node.client();
>>
>> I'm also interested in reducing results/aggs using some custom code,
>> but to have it executed on elasticsearch cluster not to transfer
>> results to client node and reduce it there, but I'm not sure if this
>> is possible with custom transport actions when using TransportClient.
>>
>> Any info on this would be appreciated,
>> Tomislav
>>
>>
>> 2014-09-11 20:44 GMT+02:00 joergpra...@gmail.com :
>> > Yes. I have checked in some code for a simple action plugin.
>> >
>> > https://github.com/jprante/elasticsearch-simple-action-plugin
>> >
>> > The plugin implements a simple "match_all" search action, by reusing
>> > much of
>> > the code of the search action.
>> >
>> > Best,
>> >
>> > Jörg
>> >
>> >
>> >
>> > On Thu, Sep 11, 2014 at 7:55 PM, Sandeep Ramesh Khanzode
>> >  wrote:
>> >>
>> >> Thanks for bearing with me till now :) Please provide one final input
>> >> on
>> >> this issue.
>> >>
>> >> Is there any example for a custom search action? If not, can you please
>> >> provide some details on how I can implement one?
>> >>
>> >> Thanks,
>> >> Sandeep
>> >>
>> >>
>> >> On Thu, Sep 11, 2014 at 4:53 PM, joergpra...@gmail.com
>> >>  wrote:
>> >>>
>> >>> You can not intercept the SearchResponse on the ES server itself.
>> >>> Instead, you must implement your custom search action.
>> >>>
>> >>> Jörg
>> >>>
>> >>> On Thu, Sep 11, 2014 at 10:00 AM, Sandeep Ramesh Khanzode
>> >>>  wrote:
>> 
>>  When you say, 'receive the SearchResponse', is that in the ES Server
>>  node or the TransportClient node that spawned the request? I would
>>  want to
>>  intercept the SearchResponse when created at the ES Server itself,
>>  since I
>>  want to send the subset of Response to another process on the same
>>  node, and
>>  it would not be very efficient to have the response sent back to the
>>  client
>>  node only to be sent back again.
>> 
>>  Thanks,
>>  Sandeep
>> 
>>  On Thu, Sep 11, 2014 at 12:43 PM, joergpra...@gmail.com
>>   wrote:
>> >
>> > You can receive the SearchResponse, process the response, and return
>> > the response with whatever format you want.
>> >
>> > Jörg
>> >
>> > On Wed, Sep 10, 2014 at 11:59 AM, Sandeep Ramesh Khanzode
>> >  wrote:
>> >>
>> >> Hi Jorg,
>> >>
>> >> Thanks for the links. I was checking the s

Re: Term query on long type failing

Your retweet ids are close to the maximum double value. Can you check that
your retweet ids are actually mapped as longs (not double) and try to
reproduce from the command line with curl (instead of say from a browser or
any programming language since some of them would store all numbers as
doubles)?

On Mon, Oct 20, 2014 at 2:54 PM, vineeth mohan 
wrote:

> Hello Adrien ,
>
> Thanks for your reply.
>
> But match query is also not working for me -
>
> {
>   "query": {
> "match": {
>   "retweet.id": 524120494964535300
> }
>   }
> }
>
> Gives 0 results.
>
> Thanks
>   Vineeth
>
> On Mon, Oct 20, 2014 at 5:50 PM, Adrien Grand <
> adrien.gr...@elasticsearch.com> wrote:
>
>> Hi,
>>
>> The term query aims at querying documents based on the raw bytes of a
>> term. It is not aware of your field mappings while numeric terms are not
>> encoded verbatim: they use a special binary encoding that allows the binary
>> representation of the numbers to be sortable lexicographically. Switching
>> to the match[1] query instead of term should fix the issue.
>>
>> [1]
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
>>
>> On Mon, Oct 20, 2014 at 1:40 PM, vineeth mohan > > wrote:
>>
>>> Hi ,
>>>
>>> I am extracting tweets from twitter and i found the following issue.
>>> On doing a terms facet on field retweet.id , i received some user ID's.
>>> Now on doing a term query on one of the value obtained , I am not
>>> getting any result.
>>>
>>> The facet is as following  -
>>> {
>>>   "facets": {
>>> "terms": {
>>>   "terms": {
>>> "field": "retweet.id",
>>> "size": 10,
>>> "order": "count",
>>> "exclude": []
>>>   }
>>> }
>>> I received value 524145031945877919 using the faceting over field
>>> retweet.id as the top first.
>>>
>>> facets: {
>>>
>>>- terms: {
>>>   - _type: terms
>>>   - missing: 1251
>>>   - total: 1213
>>>   - other: 984
>>>   - terms: [
>>>  - {
>>> - term: 524145031945877919
>>> - count: 53
>>>  }
>>>  - {
>>>
>>>
>>> Now i executed the following -
>>>
>>> {
>>>"query" : {
>>>"term" : {
>>>"retweet.id" : 524145031945877919
>>>}
>>>  }
>>> }
>>> Its giving me 0 results.
>>>
>>> Kindly point out , what is the issue.
>>>
>>> Thanks
>>>  Vineeth
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nnjxV-_DpVKg1Ewcuq7wxrDsg48fkdo5OA92jENy9d2Q%40mail.gmail.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Adrien Grand
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5KNUhw%2BGSP_N1ZSWZWisGFHbmgTNz2YE10QMJDBXtq4Q%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nf6rSd5AJKDnhfS2mGu0tRQJUzQ__QWhf3AoRBotWrUg%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5hQ984sr4q%2B0cPi5HkMfHZcZ1r%2BuBXO3wJ%2BiXQ-J854A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: update by query and refresh

Hi Igor,

It really depends on your indexing rate. If you plan on performing no more
than one refresh per second, things will be fine (this is what
elasticsearch does by default). However, running refresh much more often
could cause a lot more flush/merge activity, and this will hurt not only
your index rate but also your search rate because of all these new segments
that will keep on being published. I don't really have a solution to this
issue, this is a hard problem.

On Mon, Oct 20, 2014 at 11:58 AM, Igor Kupczyński 
wrote:

> Hello,
>
> We use _update_by_query plugin to bulk update the documents. In the tests
> we've hit an issue where not all the documents are updated because the
> index is may not be refreshed before we do _update_by_query.
>
> We have refresh interval set to 1 sec and this issue won't happen very
> often in the real life, as usually there is a longer timeframe between
> adding and updating a document.
>
> Nevertheless we want to solve the issue. Right now I can see two solutions:
>
> 1) Migrate _update_by_query to update by _id where possible (this works as
> documents are *gettable *by id right after they are indexed
> 2) Issue refresh before all _update_by_query operations
>
> The latter solution will make us safe (_refresh is blocking and we'll wait
> for confirmation before issuing update by's), but what is the performance
> cost? Is it a major one? For 99% for update_by_queries the refresh is not
> needed but we have no way to tell upfront.
>
> Thanks,
> Igor
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/57e7df7c-b6ec-4af5-bc83-37880df974c9%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4cbQ4T-pb6wUoqQdytGsoutVn-LQMiQQh0CaKxVTCPWw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Term query on long type failing

2014-10-20 Thread vineeth mohan

Hello Adrien ,

Thanks for your reply.

But match query is also not working for me -

{
  "query": {
"match": {
  "retweet.id": 524120494964535300
}
  }
}

Gives 0 results.

Thanks
  Vineeth

On Mon, Oct 20, 2014 at 5:50 PM, Adrien Grand <
adrien.gr...@elasticsearch.com> wrote:

> Hi,
>
> The term query aims at querying documents based on the raw bytes of a
> term. It is not aware of your field mappings while numeric terms are not
> encoded verbatim: they use a special binary encoding that allows the binary
> representation of the numbers to be sortable lexicographically. Switching
> to the match[1] query instead of term should fix the issue.
>
> [1]
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
>
> On Mon, Oct 20, 2014 at 1:40 PM, vineeth mohan 
> wrote:
>
>> Hi ,
>>
>> I am extracting tweets from twitter and i found the following issue.
>> On doing a terms facet on field retweet.id , i received some user ID's.
>> Now on doing a term query on one of the value obtained , I am not getting
>> any result.
>>
>> The facet is as following  -
>> {
>>   "facets": {
>> "terms": {
>>   "terms": {
>> "field": "retweet.id",
>> "size": 10,
>> "order": "count",
>> "exclude": []
>>   }
>> }
>> I received value 524145031945877919 using the faceting over field
>> retweet.id as the top first.
>>
>> facets: {
>>
>>- terms: {
>>   - _type: terms
>>   - missing: 1251
>>   - total: 1213
>>   - other: 984
>>   - terms: [
>>  - {
>> - term: 524145031945877919
>> - count: 53
>>  }
>>  - {
>>
>>
>> Now i executed the following -
>>
>> {
>>"query" : {
>>"term" : {
>>"retweet.id" : 524145031945877919
>>}
>>  }
>> }
>> Its giving me 0 results.
>>
>> Kindly point out , what is the issue.
>>
>> Thanks
>>  Vineeth
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nnjxV-_DpVKg1Ewcuq7wxrDsg48fkdo5OA92jENy9d2Q%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Adrien Grand
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5KNUhw%2BGSP_N1ZSWZWisGFHbmgTNz2YE10QMJDBXtq4Q%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nf6rSd5AJKDnhfS2mGu0tRQJUzQ__QWhf3AoRBotWrUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Term query on long type failing

Hi,

The term query aims at querying documents based on the raw bytes of a term.
It is not aware of your field mappings while numeric terms are not encoded
verbatim: they use a special binary encoding that allows the binary
representation of the numbers to be sortable lexicographically. Switching
to the match[1] query instead of term should fix the issue.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html

On Mon, Oct 20, 2014 at 1:40 PM, vineeth mohan 
wrote:

> Hi ,
>
> I am extracting tweets from twitter and i found the following issue.
> On doing a terms facet on field retweet.id , i received some user ID's.
> Now on doing a term query on one of the value obtained , I am not getting
> any result.
>
> The facet is as following  -
> {
>   "facets": {
> "terms": {
>   "terms": {
> "field": "retweet.id",
> "size": 10,
> "order": "count",
> "exclude": []
>   }
> }
> I received value 524145031945877919 using the faceting over field
> retweet.id as the top first.
>
> facets: {
>
>- terms: {
>   - _type: terms
>   - missing: 1251
>   - total: 1213
>   - other: 984
>   - terms: [
>  - {
> - term: 524145031945877919
> - count: 53
>  }
>  - {
>
>
> Now i executed the following -
>
> {
>"query" : {
>"term" : {
>"retweet.id" : 524145031945877919
>}
>  }
> }
> Its giving me 0 results.
>
> Kindly point out , what is the issue.
>
> Thanks
>  Vineeth
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nnjxV-_DpVKg1Ewcuq7wxrDsg48fkdo5OA92jENy9d2Q%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5KNUhw%2BGSP_N1ZSWZWisGFHbmgTNz2YE10QMJDBXtq4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Question: shard awareness allocation and shard allocation filtering

2014-10-20 Thread Boaz Leskes

Hi Grégoire

A couple of comments:

> 2. at some point (disk on ssds is above 65%), one copy is moved to larger 
boxes (1 copy is still on ssd to help search, 1 copy on large box)

Allocation awareness causes elasticsearch to spread the shards copies 
across the different values of the attribute. However, it also changes the 
search behavior in the sense that it tries to execute searches on nodes 
that have the same attributes as the one that initially got the search. In 
your case it means that if an ssd node got the search, it will run on SSD 
otherwise it will on iodisk. I'm not sure this is what you want.

> 2. At some point, I drop the requirement (effectively 
> *routing.allocation.require: 
**). I expect flavor awareness to move one copy to large (iodisk) boxes.

ES tries the balance shards from the cluster perspective. It gives some 
weight to spreading up the shards of an index but this just one 
parameter.In your cases I suspect you have way more shards on the iodisk 
nodes than on the ssds, which means that balancing will try to move shards 
from iodisks to ssds if it can but not the other way around (as you expect).

> are awareness and filtering supposed to cooperate?

I think they should but I'm not sure it will achieve what you want to do - 
see comment above. That said, I can confirm that shard allocation awareness 
and filtering on the same attribute may be in each other way. I would 
suggest you open an issue on github indicating that when shard allocation 
awareness is causing unassigned shards if one of the attribute values is 
blocked by an allocation filter (doesn't matter  which filter is being 
used). You would expect it to behave the same as if the nodes were down (in 
which case the shards will be assigned). Try to give a concise reproduction 
using two different attributes for filtering and awareness.

Cheers,
Boaz

On Saturday, October 18, 2014 8:37:29 PM UTC+2, Grégoire Seux wrote:
>
> On Thu, Oct 16, 2014 at 11:42 AM, Grégoire Seux 
>  wrote: 
> >- are awareness and filtering supposed to cooperate? 
>
> A quick look at the code confirm that allocation deciders are fully 
> orthogonal. 
> Should I open a github issue to discuss adding support for cooperating 
> deciders ? 
>
> -- 
> Grégoire Seux 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b6cc56d4-f2aa-403c-a46e-54c34b3a41a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES error while creating index on Solaris

2014-10-20 Thread Clinton Gormley

Hi Abhinav

It would be good to know exactly where this problem is coming from. Is it
the way that Logstash adds the template, or is it in the Elasticsearch
layer.  Please could you try something:

* Delete the existing template and index in Elasticsearch
* Take the Logstash template and create it yourself, not using Logstash
* Then index a document into an index that matches the template pattern

If this works, then we know the problem is in the Logstash layer, rather
than in the Elasticsearch layer

thanks

On 19 October 2014 16:38, Abhinav Sonkar  wrote:

> Thanks Jorg.
>
> Some more information:
>
> The error message keeps repeating until ES is restarted. After restarting
> ES, index gets created with some bulk shards message (I am not in VPN right
> now so can't show the exact message). When the time comes to create new
> logstash index, error starts coming again.
>
> Abhinav
>
>
> On Sunday, 19 October 2014 16:31:07 UTC+2, Jörg Prante wrote:
>>
>> Sorry, I overlooked this - yes it is SPARC. I will investigate.
>>
>> Jörg
>>
>> On Sun, Oct 19, 2014 at 4:30 PM, joerg...@gmail.com 
>> wrote:
>>
>>> Is this Solaris SPARC?
>>>
>>> Looks like a compression / Jackson issue, SPARC is 64bit big-endian, so
>>> this is interesting.
>>>
>>> Jörg
>>>
>>> On Sun, Oct 19, 2014 at 12:25 PM, Abhinav Sonkar 
>>> wrote:
>>>
 Hello,

 I am using ES 1.3.4 with Logstash 1.4.2 on Solaris sparc. Everyday when
 Logstash tries to create a new index ES stops responding (process is
 active) with below error:

 [2014-10-19 12:23:02,264][DEBUG][action.admin.indices.create] [SZ1248
 Morpheus] [logstash-2014.10.19] failed to create
 org.elasticsearch.common.jackson.core.JsonParseException: Unexpected
 character ('_' (code 95)): was expecting a colon to separate field name and
 value
  at [Source: {"_default_":{"_defa:{"_defled":true},"dynamic_
 templates":[{"string_fields":[{"stch":"*",""stch":"*ping_
 fiel":"*","ng",""stch"ga:{"_def":"*","ng","index":"*","
 yzed":truit_norms":[{"s,"fields":[{"stc:{"_def":"*",""
 ga:{"_dex":"*","an,"yzed":truitre_abovf":"*",
 roperties":[{"sersionc:{"_def":"*",""ga:{"_dex":"*","an,"
 yzed"},"geoipc:{"_def":"object",""stcmic":true},"dth":
 "*","",""stcerties":[{"sertionc:{"_def":"geoipc:nt"}}; line: 1,
 column: 25]
 at org.elasticsearch.common.jackson.core.JsonParser._
 constructError(JsonParser.java:1419)
 at org.elasticsearch.common.jackson.core.base.
 ParserMinimalBase._reportError(ParserMinimalBase.java:508)
 at org.elasticsearch.common.jackson.core.base.
 ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:437)
 at org.elasticsearch.common.jackson.core.json.
 ReaderBasedJsonParser._skipColon2(ReaderBasedJsonParser.java:1773)
 at org.elasticsearch.common.jackson.core.json.
 ReaderBasedJsonParser._skipColon(ReaderBasedJsonParser.java:1746)
 at org.elasticsearch.common.jackson.core.json.
 ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:621)
 at org.elasticsearch.common.xcontent.json.
 JsonXContentParser.nextToken(JsonXContentParser.java:50)
 at org.elasticsearch.common.xcontent.support.
 AbstractXContentParser.readMap(AbstractXContentParser.java:268)
 at org.elasticsearch.common.xcontent.support.
 AbstractXContentParser.readValue(AbstractXContentParser.java:308)
 at org.elasticsearch.common.xcontent.support.
 AbstractXContentParser.readMap(AbstractXContentParser.java:275)
 at org.elasticsearch.common.xcontent.support.
 AbstractXContentParser.readMap(AbstractXContentParser.java:254)
 at org.elasticsearch.common.xcontent.support.
 AbstractXContentParser.map(AbstractXContentParser.java:208)
 at org.elasticsearch.common.xcontent.support.
 AbstractXContentParser.mapAndClose(AbstractXContentParser.java:219)
 at org.elasticsearch.cluster.metadata.
 MetaDataCreateIndexService.parseMapping(MetaDataCreateIndexService.
 java:473)
 at org.elasticsearch.cluster.metadata.
 MetaDataCreateIndexService.access$400(MetaDataCreateIndexService.
 java:89)
 at org.elasticsearch.cluster.metadata.
 MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.
 java:260)
 at org.elasticsearch.cluster.service.InternalClusterService$
 UpdateTask.run(InternalClusterService.java:328)
 at org.elasticsearch.common.util.concurrent.
 PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(
 PrioritizedEsThreadPoolExecutor.java:153)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(
 ThreadPoolExecutor.java:1145)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(
 ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 I tried with

Term query on long type failing

2014-10-20 Thread vineeth mohan

Hi ,

I am extracting tweets from twitter and i found the following issue.
On doing a terms facet on field retweet.id , i received some user ID's.
Now on doing a term query on one of the value obtained , I am not getting
any result.

The facet is as following  -
{
  "facets": {
"terms": {
  "terms": {
"field": "retweet.id",
"size": 10,
"order": "count",
"exclude": []
  }
}
I received value 524145031945877919 using the faceting over field retweet.id
as the top first.

facets: {

   - terms: {
  - _type: terms
  - missing: 1251
  - total: 1213
  - other: 984
  - terms: [
 - {
- term: 524145031945877919
- count: 53
 }
 - {


Now i executed the following -

{
   "query" : {
   "term" : {
   "retweet.id" : 524145031945877919
   }
 }
}
Its giving me 0 results.

Kindly point out , what is the issue.

Thanks
 Vineeth

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nnjxV-_DpVKg1Ewcuq7wxrDsg48fkdo5OA92jENy9d2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to find the thread pool size of an ElasticSearch cluster?

bulk.queueSize is the maximum size before requests are rejected.

Jörg

On Mon, Oct 20, 2014 at 12:09 PM, truong ha  wrote:

> I'm writing the concurrent code to send bulk index to ElasticSearch, and
> sending this query to get the thread pool size:
>
> GET /_cat/thread_pool?v&h=host,bulk.active,bulk.queueSize
>
> The response is
>
> hostbulk.active bulk.queueSize
> 1D4HPY1   0 50
> 1D4HPY2   0 50
> 1D4HPY3   0 50
> 1D4HPY4   0 50
>
> So how can I calculate the actual pool size of that cluster? Is it the sum
> of all hosts which means 200?
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e6e48fe3-6269-493c-9258-fdc97baeb27e%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHBLvxUvLweqDQUQOs2bP6OxJPs3TgJkFLKWk%2BZi6Bgkg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

How to find the thread pool size of an ElasticSearch cluster?

I'm writing the concurrent code to send bulk index to ElasticSearch, and 
sending this query to get the thread pool size:

GET /_cat/thread_pool?v&h=host,bulk.active,bulk.queueSize

The response is

hostbulk.active bulk.queueSize 
1D4HPY1   0 50 
1D4HPY2   0 50
1D4HPY3   0 50 
1D4HPY4   0 50

So how can I calculate the actual pool size of that cluster? Is it the sum 
of all hosts which means 200?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e6e48fe3-6269-493c-9258-fdc97baeb27e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

update by query and refresh

2014-10-20 Thread Igor Kupczyński

Hello,

We use _update_by_query plugin to bulk update the documents. In the tests 
we've hit an issue where not all the documents are updated because the 
index is may not be refreshed before we do _update_by_query.

We have refresh interval set to 1 sec and this issue won't happen very 
often in the real life, as usually there is a longer timeframe between 
adding and updating a document.

Nevertheless we want to solve the issue. Right now I can see two solutions:

1) Migrate _update_by_query to update by _id where possible (this works as 
documents are *gettable *by id right after they are indexed
2) Issue refresh before all _update_by_query operations

The latter solution will make us safe (_refresh is blocking and we'll wait 
for confirmation before issuing update by's), but what is the performance 
cost? Is it a major one? For 99% for update_by_queries the refresh is not 
needed but we have no way to tell upfront.

Thanks,
Igor

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/57e7df7c-b6ec-4af5-bc83-37880df974c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Return score in multiple fields

2014-10-20 Thread Kruti Shukla

Hello All,

I'm trying achieve one functionality in Elasticsearch but I'm not able to 
do it.

In SQL we can do it like --> select SET score_1 = _score from sometable

I trying to assign value of score in one field. That means Elastic search 
will return 2 columns having same values _score and _score1.

I have already tried custom score but it changes the value of _score column 
it self.I DO NOT WANT TO CHANGE . 
I'm already happy with the score returned in "_score"  field.
I want to have same value of "_score" column in another column for example 
"score_1". 

I want to do same in Elasticsearch.

Is it possible? Is there any functionality provided in elasticsearch?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a8a22823-593f-40f3-8ab6-eb6da2bc85c5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: another EC2 publish_host public ip problem

2014-10-20 Thread David Pilato

Hi Martin,

Did you check your firewall settings? Did you open 9200 ports so they can be
accessible from your local machine?

BTW, those settings are not used. You can comment/remove them:

discovery.zen.ping.multicast.enabled: false
network.publish_host: "54.31.403.195"

--
David Pilato | Technical Advocate | elasticsearch.com
david.pil...@elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 19 octobre 2014 à 07:40:46, martin.enzin...@gmail.com
(martin.enzin...@gmail.com) a écrit:

Hi,

I can't figure out why Elasticsearch still can't be reached from the public ip
of the ec2 instance. This is my config file

cloud.aws.access_key: "AKIAXPWEW2A"
cloud.aws.secret_key: "EvU0I5Xx+b+FlRXXXSSDFfM2Z"

plugin.mandatory: "cloud-aws"

cluster.name: "escluster"

node.name: "Iron Fist"

discovery.type: "ec2"
discovery.ec2.groups: "launch-wizard-4"
discovery.ec2.host_type: "public_ip"
discovery.ec2.ping_timeout: "30s"
discovery.ec2.availability_zones: "us-east-1a"
cloud.aws.region: "us-east"

discovery.zen.ping.multicast.enabled: false

network.publish_host: "54.31.403.195"

Do you have a hint for me? Curl-ing the private ip works.
Your help is much appreciated.

Thank you, best regards
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/59f12b8d-deef-4c0b-acea-94fcf4237d73%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.5444d4d6.19495cff.8fc%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

nested filter query issue

2014-10-20 Thread shekhar chauhan

Hi All,
  I am working on nested filter query.But this query is searching only on 
nested object properties of type int ,double,date.But not searching when 
the properties of string type.
My document structure is given below-

"sectionResults": [
   {
   "SectionName": "Project Details",
   "itemResults": [
   {
   "InspectionItem": "Project Id",

   "Value": "asd",
   "NumericValue": 0,
   "DoubleValue": 1,
   "LongValue": 0,
   "DateValue": "2014-10-08T00:00:00"
   }

]

   }

]



and my query is -


{

  "query": {

"filtered": {

  "query": {

"match_all": {}

  },

  "filter": {

"nested": {

  "path": "sectionResults.itemResults",

  "filter": {

"bool": {

  "must": [ 

{

  "range": {

"sectionResults.itemResults.DateValue": {

  "from": "2014-10-01",

  "to": "2014-10-08"

}

  }

},

{

  "term": {

"sectionResults.itemResults.InspectionItem": "Project Id"

  }

}

  ]

}

  },

  "_cache": true

}

}

  }

}

  } 


For Example - If i do search on InspectionItem Property of itemResults.then 
result found is 0.But gives reults in case of DateValue,DoubleValue properties 
of object itemResults .Please help.

Thanks in advance,







-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8688c1c4-8111-41e2-af4d-ad2e47295ad6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Joda problem!!!!!!

Strange I decided to drop this for now due to a presentation tomorrow. 
Will pick it up afterwards...:)

Cheers

On Monday, October 20, 2014 9:06:44 AM UTC+2, Peter Litsegård wrote:
>
> Hi!
>
> I've been struggling with two date formats. I'll give you a sample and the 
> corresponding date format I've used:
>
> 1. "Oct 20, 2014 8:42:41 AM" : MMM d  hh:mm:ss aa
> 2. "Mon Oct 20 06:42:41 + 2014" : EEE MMM d HH:mm:ss Z 
>
> I've used the following mapping:
>
> {
> "settings" : {
> "index" : {
> "number_of_shards" : 1,
> "number_of_replicas" : 1
> }
> },
> "mappings" : {
> "envelope_v1" : {
> "_all" : { "enabled" : true },
> "_id" : { "index" : "not_analyzed", "store" : false, "path" : 
> "id" },
> "_timestamp" : { "enabled" : true, "path" : "createdAt" },
> "_ttl" : { "enabled" : true, "default" : "30d" },
> "properties" : {
> "id" : { "type" : "string", "store" : false, "index" : 
> "no" },
> "createdAt" : { "type" : "date", "format" : "MMM d  
> hh:mm:ss aa", "store" : false, "index" : "no" },
> "ref" : { "type" : "string", "store" : false, "index" : 
> "no" },
> "sourceType" : { "type" : "string", "store" : false, 
> "index" : "no" },
> "search" : { "type" : "string", "store" : false, "index" : 
> "no" },
> "source.created_at" : { "type" : "date", "format" : "EEE 
> MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
> "source.timestamp_ms" : { "type" : "date", "store" : 
> false, "index" : "no" },
> "source.user.created_at" : { "type" : "date", "format" : 
> "EEE MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
> "source.analysis.nlp" : { "type" : "object", "store" : 
> false, "index" : "no" }
> }
> }
> }
> }
>
> While adding data I get the following exception:
>
> java.lang.IllegalArgumentException: Invalid format: "Mon Oct 20 06:30:52 
> + 2014"
> at 
> org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:754)
>
> What am I doing wrong here?
>
> Cheers
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fa70c14-4d2c-42fe-b025-04d15f3bae0d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Joda problem!!!!!!

I works here


PUT /myindex

PUT /myindex/test/_mapping
{
   "test": {
 "properties": {
   "created": {
"type" : "date",
"format" : "MMM d','  KK:mm:ss aa",
"store" : true,
"index" : "no"
   }
}
   }
}


PUT /myindex/test/1
{
"created" : "Oct 20, 2014 9:59:25 AM"
}


GET /myindex/test/1

{
   "_index": "myindex",
   "_type": "test",
   "_id": "1",
   "_version": 1,
   "found": true,
   "_source": {
  "created": "Oct 20, 2014 9:59:25 AM"
   }
}

Jörg

On Mon, Oct 20, 2014 at 10:50 AM, Peter Litsegård <
peter.litsega...@gmail.com> wrote:

> I decided to drop this for now and provide the timestamp field (createdAt)
> in milliseconds...
>
> Thanks for your help though...
>
> /Peter
>
> On Monday, October 20, 2014 9:06:44 AM UTC+2, Peter Litsegård wrote:
>
>> Hi!
>>
>> I've been struggling with two date formats. I'll give you a sample and
>> the corresponding date format I've used:
>>
>> 1. "Oct 20, 2014 8:42:41 AM" : MMM d  hh:mm:ss aa
>> 2. "Mon Oct 20 06:42:41 + 2014" : EEE MMM d HH:mm:ss Z 
>>
>> I've used the following mapping:
>>
>> {
>> "settings" : {
>> "index" : {
>> "number_of_shards" : 1,
>> "number_of_replicas" : 1
>> }
>> },
>> "mappings" : {
>> "envelope_v1" : {
>> "_all" : { "enabled" : true },
>> "_id" : { "index" : "not_analyzed", "store" : false, "path" :
>> "id" },
>> "_timestamp" : { "enabled" : true, "path" : "createdAt" },
>> "_ttl" : { "enabled" : true, "default" : "30d" },
>> "properties" : {
>> "id" : { "type" : "string", "store" : false, "index" :
>> "no" },
>> "createdAt" : { "type" : "date", "format" : "MMM d 
>> hh:mm:ss aa", "store" : false, "index" : "no" },
>> "ref" : { "type" : "string", "store" : false, "index" :
>> "no" },
>> "sourceType" : { "type" : "string", "store" : false,
>> "index" : "no" },
>> "search" : { "type" : "string", "store" : false, "index"
>> : "no" },
>> "source.created_at" : { "type" : "date", "format" : "EEE
>> MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
>> "source.timestamp_ms" : { "type" : "date", "store" :
>> false, "index" : "no" },
>> "source.user.created_at" : { "type" : "date", "format" :
>> "EEE MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
>> "source.analysis.nlp" : { "type" : "object", "store" :
>> false, "index" : "no" }
>> }
>> }
>> }
>> }
>>
>> While adding data I get the following exception:
>>
>> java.lang.IllegalArgumentException: Invalid format: "Mon Oct 20 06:30:52
>> + 2014"
>> at org.elasticsearch.common.joda.time.format.DateTimeFormatter.
>> parseMillis(DateTimeFormatter.java:754)
>>
>> What am I doing wrong here?
>>
>> Cheers
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1726a6b2-eebe-42ae-a3f6-16bf4c59e006%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGt-vt%2BhN%3Da5VBG1fmpG%2BJcj%2BuU7LcBsYjCXcjLA9iPaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Joda problem!!!!!!

I decided to drop this for now and provide the timestamp field (createdAt) 
in milliseconds...

Thanks for your help though...

/Peter

On Monday, October 20, 2014 9:06:44 AM UTC+2, Peter Litsegård wrote:
>
> Hi!
>
> I've been struggling with two date formats. I'll give you a sample and the 
> corresponding date format I've used:
>
> 1. "Oct 20, 2014 8:42:41 AM" : MMM d  hh:mm:ss aa
> 2. "Mon Oct 20 06:42:41 + 2014" : EEE MMM d HH:mm:ss Z 
>
> I've used the following mapping:
>
> {
> "settings" : {
> "index" : {
> "number_of_shards" : 1,
> "number_of_replicas" : 1
> }
> },
> "mappings" : {
> "envelope_v1" : {
> "_all" : { "enabled" : true },
> "_id" : { "index" : "not_analyzed", "store" : false, "path" : 
> "id" },
> "_timestamp" : { "enabled" : true, "path" : "createdAt" },
> "_ttl" : { "enabled" : true, "default" : "30d" },
> "properties" : {
> "id" : { "type" : "string", "store" : false, "index" : 
> "no" },
> "createdAt" : { "type" : "date", "format" : "MMM d  
> hh:mm:ss aa", "store" : false, "index" : "no" },
> "ref" : { "type" : "string", "store" : false, "index" : 
> "no" },
> "sourceType" : { "type" : "string", "store" : false, 
> "index" : "no" },
> "search" : { "type" : "string", "store" : false, "index" : 
> "no" },
> "source.created_at" : { "type" : "date", "format" : "EEE 
> MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
> "source.timestamp_ms" : { "type" : "date", "store" : 
> false, "index" : "no" },
> "source.user.created_at" : { "type" : "date", "format" : 
> "EEE MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
> "source.analysis.nlp" : { "type" : "object", "store" : 
> false, "index" : "no" }
> }
> }
> }
> }
>
> While adding data I get the following exception:
>
> java.lang.IllegalArgumentException: Invalid format: "Mon Oct 20 06:30:52 
> + 2014"
> at 
> org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:754)
>
> What am I doing wrong here?
>
> Cheers
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1726a6b2-eebe-42ae-a3f6-16bf4c59e006%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Joda problem!!!!!!

Just to give you a sample.

Partial doc JSON:

{
  "id": "524107608429699072",
  "createdAt": "Oct 20, 2014 9:59:25 AM",
  "source": {
"created_at": "Mon Oct 20 07:59:25 + 2014",
"source": "http://ElwynRoad.com\"; rel=\"nofollow\">Elwyn 
Road",
"retweet_count": 0,
"retweeted": false,
"filter_level": "medium",
"id_str": "524107608429699072",
...

As you can see "createdAt" conforms nicely to the following mapping:

{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
},
"mappings" : {
"envelope_v1" : {
"_all" : { "enabled" : true },
"_id" : { "index" : "not_analyzed", "store" : false, "path" : 
"id" },
"_timestamp" : { "enabled" : true, "path" : "createdAt" },
"_ttl" : { "enabled" : true, "default" : "30d" },
"properties" : {
"id" : { "type" : "string", "store" : false, "index" : "no" 
},
"createdAt" : { "type" : "date", "format" : "MMM d','  
KK:mm:ss aa", "store" : true, "index" : "no" },
"ref" : { "type" : "string", "store" : false, "index" : 
"no" },
"sourceType" : { "type" : "string", "store" : false, 
"index" : "no" },
"search" : { "type" : "string", "store" : false, "index" : 
"no" },
"source.created_at" : { "type" : "date", "format" : "EEE 
MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
"source.timestamp_ms" : { "type" : "date", "store" : false, 
"index" : "no" },
"source.user.created_at" : { "type" : "date", "format" : 
"EEE MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
"source.analysis.nlp" : { "type" : "object", "store" : 
false, "index" : "no" }
}
}
}
}

Still ES throws:

ElasticsearchParseException[failed to parse doc to extract 
routing/timestamp/id]; nested: TimestampParsingException[failed to parse 
timestamp [Oct 20, 2014 9:59:25 AM]]; 


Sigh...



On Monday, October 20, 2014 9:06:44 AM UTC+2, Peter Litsegård wrote:
>
> Hi!
>
> I've been struggling with two date formats. I'll give you a sample and the 
> corresponding date format I've used:
>
> 1. "Oct 20, 2014 8:42:41 AM" : MMM d  hh:mm:ss aa
> 2. "Mon Oct 20 06:42:41 + 2014" : EEE MMM d HH:mm:ss Z 
>
> I've used the following mapping:
>
> {
> "settings" : {
> "index" : {
> "number_of_shards" : 1,
> "number_of_replicas" : 1
> }
> },
> "mappings" : {
> "envelope_v1" : {
> "_all" : { "enabled" : true },
> "_id" : { "index" : "not_analyzed", "store" : false, "path" : 
> "id" },
> "_timestamp" : { "enabled" : true, "path" : "createdAt" },
> "_ttl" : { "enabled" : true, "default" : "30d" },
> "properties" : {
> "id" : { "type" : "string", "store" : false, "index" : 
> "no" },
> "createdAt" : { "type" : "date", "format" : "MMM d  
> hh:mm:ss aa", "store" : false, "index" : "no" },
> "ref" : { "type" : "string", "store" : false, "index" : 
> "no" },
> "sourceType" : { "type" : "string", "store" : false, 
> "index" : "no" },
> "search" : { "type" : "string", "store" : false, "index" : 
> "no" },
> "source.created_at" : { "type" : "date", "format" : "EEE 
> MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
> "source.timestamp_ms" : { "type" : "date", "store" : 
> false, "index" : "no" },
> "source.user.created_at" : { "type" : "date", "format" : 
> "EEE MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
> "source.analysis.nlp" : { "type" : "object", "store" : 
> false, "index" : "no" }
> }
> }
> }
> }
>
> While adding data I get the following exception:
>
> java.lang.IllegalArgumentException: Invalid format: "Mon Oct 20 06:30:52 
> + 2014"
> at 
> org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:754)
>
> What am I doing wrong here?
>
> Cheers
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8afad408-5c12-4831-80d5-83ea37294547%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Joda problem!!!!!!

Maybe the old mapping is still active?

This works here:

import org.elasticsearch.common.joda.FormatDateTimeFormatter;
import org.elasticsearch.common.joda.Joda;
import org.elasticsearch.common.joda.time.DateTime;
import org.testng.annotations.Test;

public class JodaTest {

@Test
public void joda() {
FormatDateTimeFormatter f = Joda.forPattern("MMM d','  KK:mm:ss
aa");
DateTime t = f.parser().parseDateTime("Oct 20, 2014 09:35:42 AM");
System.err.println(f.printer().print(t));
}
}

Jörg

On Mon, Oct 20, 2014 at 9:43 AM, Peter Litsegård  wrote:

> Hi Jörg!
>
> Thanks for your response. However, IMHO I don't think this is the problem.
> I changed the Joda specifikation in (1) to "MMM d','  hh:mm:ss aa" and
> when I used this simple test code
>
>  DateTime dt = new DateTime();
>  DateTimeFormatter fmt = DateTimeFormat.forPattern("MMM
> d','  KK:mm:ss aa");
>  System.out.println(json);
>  String str = fmt.print(dt);
>
> I got the following printout:
>
> Oct 20, 2014 09:35:42 AM
>
> which corresponds nicely to the data I try to index:
> ...4","createdAt":"Oct 20, 2014 9:35:40 AM"... (just an extract from the
> complete JSON document)
>
> I changed the mapping for "createdAt" to "MMM d','  hh:mm:ss aa" BUT I
> still get exactly the same error...
>
> VERY frustrating...
>
> Cheers
>
>
>
> On Monday, October 20, 2014 9:06:44 AM UTC+2, Peter Litsegård wrote:
>>
>> Hi!
>>
>> I've been struggling with two date formats. I'll give you a sample and
>> the corresponding date format I've used:
>>
>> 1. "Oct 20, 2014 8:42:41 AM" : MMM d  hh:mm:ss aa
>> 2. "Mon Oct 20 06:42:41 + 2014" : EEE MMM d HH:mm:ss Z 
>>
>> I've used the following mapping:
>>
>> {
>> "settings" : {
>> "index" : {
>> "number_of_shards" : 1,
>> "number_of_replicas" : 1
>> }
>> },
>> "mappings" : {
>> "envelope_v1" : {
>> "_all" : { "enabled" : true },
>> "_id" : { "index" : "not_analyzed", "store" : false, "path" :
>> "id" },
>> "_timestamp" : { "enabled" : true, "path" : "createdAt" },
>> "_ttl" : { "enabled" : true, "default" : "30d" },
>> "properties" : {
>> "id" : { "type" : "string", "store" : false, "index" :
>> "no" },
>> "createdAt" : { "type" : "date", "format" : "MMM d 
>> hh:mm:ss aa", "store" : false, "index" : "no" },
>> "ref" : { "type" : "string", "store" : false, "index" :
>> "no" },
>> "sourceType" : { "type" : "string", "store" : false,
>> "index" : "no" },
>> "search" : { "type" : "string", "store" : false, "index"
>> : "no" },
>> "source.created_at" : { "type" : "date", "format" : "EEE
>> MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
>> "source.timestamp_ms" : { "type" : "date", "store" :
>> false, "index" : "no" },
>> "source.user.created_at" : { "type" : "date", "format" :
>> "EEE MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
>> "source.analysis.nlp" : { "type" : "object", "store" :
>> false, "index" : "no" }
>> }
>> }
>> }
>> }
>>
>> While adding data I get the following exception:
>>
>> java.lang.IllegalArgumentException: Invalid format: "Mon Oct 20 06:30:52
>> + 2014"
>> at org.elasticsearch.common.joda.time.format.DateTimeFormatter.
>> parseMillis(DateTimeFormatter.java:754)
>>
>> What am I doing wrong here?
>>
>> Cheers
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/71744ebf-91ad-4fbe-bc5e-b7aa3f457881%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFY%3DCHihu97gGj%2B3Ds%3Dxk9q5LBWReFONg-U5rMmojyBEg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

How to filter low term-frequency documents in elasticsearch?

2014-10-20 Thread chenjinyuan87

Hey guys:

We are using elasticsearch to push newly added documents to users. Thus the 
search results are ranked according publish time.
In this case, users often receive low relevance documents, such as 
documents in which query words only appear once.
How can we filter such documents in elasticsearch?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5148ebe2-0cbe-4367-9a35-f1e64eac88ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Joda problem!!!!!!

Hi Jörg!

Thanks for your response. However, IMHO I don't think this is the problem. 
I changed the Joda specifikation in (1) to "MMM d','  hh:mm:ss aa" and 
when I used this simple test code 

 DateTime dt = new DateTime();
 DateTimeFormatter fmt = DateTimeFormat.forPattern("MMM 
d','  KK:mm:ss aa");
 System.out.println(json);
 String str = fmt.print(dt);

I got the following printout:

Oct 20, 2014 09:35:42 AM

which corresponds nicely to the data I try to index: ...4","createdAt":"Oct 
20, 2014 9:35:40 AM"... (just an extract from the complete JSON document)

I changed the mapping for "createdAt" to "MMM d','  hh:mm:ss aa" BUT I 
still get exactly the same error...

VERY frustrating...

Cheers


On Monday, October 20, 2014 9:06:44 AM UTC+2, Peter Litsegård wrote:
>
> Hi!
>
> I've been struggling with two date formats. I'll give you a sample and the 
> corresponding date format I've used:
>
> 1. "Oct 20, 2014 8:42:41 AM" : MMM d  hh:mm:ss aa
> 2. "Mon Oct 20 06:42:41 + 2014" : EEE MMM d HH:mm:ss Z 
>
> I've used the following mapping:
>
> {
> "settings" : {
> "index" : {
> "number_of_shards" : 1,
> "number_of_replicas" : 1
> }
> },
> "mappings" : {
> "envelope_v1" : {
> "_all" : { "enabled" : true },
> "_id" : { "index" : "not_analyzed", "store" : false, "path" : 
> "id" },
> "_timestamp" : { "enabled" : true, "path" : "createdAt" },
> "_ttl" : { "enabled" : true, "default" : "30d" },
> "properties" : {
> "id" : { "type" : "string", "store" : false, "index" : 
> "no" },
> "createdAt" : { "type" : "date", "format" : "MMM d  
> hh:mm:ss aa", "store" : false, "index" : "no" },
> "ref" : { "type" : "string", "store" : false, "index" : 
> "no" },
> "sourceType" : { "type" : "string", "store" : false, 
> "index" : "no" },
> "search" : { "type" : "string", "store" : false, "index" : 
> "no" },
> "source.created_at" : { "type" : "date", "format" : "EEE 
> MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
> "source.timestamp_ms" : { "type" : "date", "store" : 
> false, "index" : "no" },
> "source.user.created_at" : { "type" : "date", "format" : 
> "EEE MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
> "source.analysis.nlp" : { "type" : "object", "store" : 
> false, "index" : "no" }
> }
> }
> }
> }
>
> While adding data I get the following exception:
>
> java.lang.IllegalArgumentException: Invalid format: "Mon Oct 20 06:30:52 
> + 2014"
> at 
> org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:754)
>
> What am I doing wrong here?
>
> Cheers
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/71744ebf-91ad-4fbe-bc5e-b7aa3f457881%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Joda problem!!!!!!

You have a comma in "Oct 20, 2014 8:42:41 AM"

"Oct 20 2014 8:42:41 AM" will work.

Jörg

On Mon, Oct 20, 2014 at 9:06 AM, Peter Litsegård  wrote:

> Hi!
>
> I've been struggling with two date formats. I'll give you a sample and the
> corresponding date format I've used:
>
> 1. "Oct 20, 2014 8:42:41 AM" : MMM d  hh:mm:ss aa
> 2. "Mon Oct 20 06:42:41 + 2014" : EEE MMM d HH:mm:ss Z 
>
> I've used the following mapping:
>
> {
> "settings" : {
> "index" : {
> "number_of_shards" : 1,
> "number_of_replicas" : 1
> }
> },
> "mappings" : {
> "envelope_v1" : {
> "_all" : { "enabled" : true },
> "_id" : { "index" : "not_analyzed", "store" : false, "path" :
> "id" },
> "_timestamp" : { "enabled" : true, "path" : "createdAt" },
> "_ttl" : { "enabled" : true, "default" : "30d" },
> "properties" : {
> "id" : { "type" : "string", "store" : false, "index" :
> "no" },
> "createdAt" : { "type" : "date", "format" : "MMM d 
> hh:mm:ss aa", "store" : false, "index" : "no" },
> "ref" : { "type" : "string", "store" : false, "index" :
> "no" },
> "sourceType" : { "type" : "string", "store" : false,
> "index" : "no" },
> "search" : { "type" : "string", "store" : false, "index" :
> "no" },
> "source.created_at" : { "type" : "date", "format" : "EEE
> MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
> "source.timestamp_ms" : { "type" : "date", "store" :
> false, "index" : "no" },
> "source.user.created_at" : { "type" : "date", "format" :
> "EEE MMM d HH:mm:ss Z ", "store" : false, "index" : "no" },
> "source.analysis.nlp" : { "type" : "object", "store" :
> false, "index" : "no" }
> }
> }
> }
> }
>
> While adding data I get the following exception:
>
> java.lang.IllegalArgumentException: Invalid format: "Mon Oct 20 06:30:52
> + 2014"
> at
> org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:754)
>
> What am I doing wrong here?
>
> Cheers
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/87eb26ef-6a72-4049-a62a-cfa504acd2ca%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFe8BPBRN%2BVgC_x-8r-vGk%3Djvz9aBhcSSyr66Qk7QHT7w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Multiple Timezones in Elasticsearch/Kibana

2014-10-20 Thread Magnus Bäck

On Thursday, October 16, 2014 at 18:57 CEST,
 Kellan Strong  wrote:

> I am having a problem with different timezones sending their
> information to elasticsearch/kibana. One of the logs that is sending
> is at UTC time however the elasticsearch box is at local time zone.
> The message is clearly sent at the time of the event however
> elasticsearch or kibana is indexing it so that only when its that time
> that will it show up.
> Is there a way to allow elasticsearch/kibana to be dynamic and read
> messages as they come in, rather than later ?

More information is needed. How are you sending the messages to
Elasticsearch? Is Logstash involved?

Kibana relies on the @timestamp field to be UTC. If your logs
are in UTC too it sounds like something is interpreting them as
local time and adjusting the timestamp accordingly before updating
@timestamp.

-- 
Magnus Bäck| Software Engineer, Development Tools
magnus.b...@sonymobile.com | Sony Mobile Communications

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20141020072815.GA8014%40seldlx20533.corpusers.net.
For more options, visit https://groups.google.com/d/optout.

Joda problem!!!!!!