date:20140624

Hello,

I think only the _boost parameter is deprecated. You can still set a
boost per field in your mapping.

Cédric Hourcade
c...@wal.fr


On Wed, Jun 25, 2014 at 2:13 AM, Sergey Pilypenko
 wrote:
> Hi!
>
> I notised that starting from 1.0.0rc1 _boost option is deprecated. From doc
> I can't imagine how to use suggested solution (function score query) with
> _all field which was taking in acount specific boosts from specific fields.
> Could somebody provide some small example for this case?
>
> Thank you in advance,
> Sergii
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f2be7fb7-2129-4b68-8e13-b1584112a60b%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPMeG2eqMxjimKAo7M2%2BCcw6wEVw3xTWGj1CUSZuJp%3DgDA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Filtering on Script Value

Hello,

You should be able to filter with a script using the script filter:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-script-filter.html

Cédric Hourcade
c...@wal.fr


On Wed, Jun 25, 2014 at 4:36 AM, Brian Behling  wrote:
> I'm trying to calculate a value for each hit then select or filter on a
> calculated value. Something like below:
>
> "query": {
> "match_all": {}
> },
> "script_fields" : {
> "counter" : {
> "script" : "count++",
> "params" : {
> "count"  : 1
> }
> },
> "source" : {
>   "script" : "_source"
> }
>   }
>
> I'd like to filter on the count parameter.
>
> I've read on a StackOverflow post that you cannot filter on a script value.
>
> So is there another way to calculate some value dynamically and filter on
> that value?
>
> If not, is there a nested SQL SELECT equivalent in ElasticSearch? Maybe I
> could execute the first query to calculate the 'count' then execute another
> query to filter by a value?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8206ea84-b314-4b8e-8f3c-248d9f5a99e7%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPPE0dAt%2BoZr%3Dev9SpDwy-04fkWX3xdAmsLcXEL%3D6i6fRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

keyword custom analyzer with highlighting is not working.

2014-06-24 Thread hongsgo

1. i test highlighting with my combine analyzer.


 
but, Expected Doc Numbers id 9  is highlighting entire sentence.
it's wrong.

2. i write this issue at combo analyzer plugin site 
- https://github.com/yakaz/elasticsearch-analysis-combo/issues/19

3. and i tested more.
curl -XGET
'http://10.101.57.97:10200/testindex/ITEM/9/_termvector?pretty=true'
"รองเท้า" : {
"term_freq" : 2,
"tokens" : [ {
"position" : 0,
"start_offset" : 0,
"end_offset" : 261
}, {
"position" : 0,
"start_offset" : 0,
"end_offset" : 261
} ]
},


4. it's tokens are below analyzer results.
"custom_foreign_languages_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"compound_word",
"keep_word", 
"thai_stop_custom",
"english_stop_custom",   
"unique_token_filter"
]
},
"custom_foreign_languages_synonym_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"compound_word",
"keep_word",
"thai_stop_custom",
"english_stop_custom",
"synonym",
"unique_token_filter"
]
}

is it impossible to highlighting  with
custom_foreign_languages_synonym_analyzer and
custom_foreign_languages_analyzer?

please somebody help me please.




--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/keyword-custom-analyzer-with-highlighting-is-not-working-tp4058466.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1403667842670-4058466.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

I have an index of 50 million docs with 2 shards. I am running a match-all
query as the initial top-level query that has 8 Terms Aggregation filters,
one of which has a high cardinality value - 10k. The rest of the
aggregations are all < 10 cardinality. Then a drill-down query is run with
a post filter.

The top-level query has a hit count of 50 million and the drill down query
has a hit count of 20 million.

Both the shards are on a single node. I am using a Transport node and java
apis to run the searches.

When I run the java client from the data node, and the system is cold I am
getting a 3 minute latency of the top level query while the drill down has
a latency of 800 ms

When the system is warm, I am getting a latency of 25 secs of the top-level
query, while the drill down remain the same at around 800 ms.

When the system is cold, if I run the same client from a remote system, the
latency of the top-level query is 21 secs, while the subsequent queries
drop down to 8 secs.

I am running the queries with the aggregation filter size set to 0 since we
need the exact count.

I understand that the high cardinality filter is slowing the queries and
the spike in the CPU is for calculating the count.

I would like to understand why the search latency is markedly less when
running the search client from a remote system - the "cold" latency value
is 21 secs using a remote client vs 3 mins on the client running on the
data node. This is when nothing else is running on the data node.

Also, I would like to understand if we can tune the duration of the cache.
I see the same latency when I re-run the query after the system is idle for
some time - hence the cache must be getting cleared after a set time-period.

Finally, is there any other type of aggregation filter (other than the
Terms Aggregation) that is recommended for high cardinality aggregation
items so as to bring down the latency?

Thanks for any pointers.
Shantanu

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f286dbeb-c398-48dd-8c0c-a1cb2a3f884e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: script_fields vs filter script

2014-06-24 Thread Brian Behling

I'm trying to filter on a calculated script field as well.

Have you figured this out Kajal?

On Tuesday, May 27, 2014 10:49:35 AM UTC-6, Kajal Patel wrote:
>
> Hey, 
>
> Can you actually post your solution If you figured out. 
> I am having similar issue, I need to filter search result based on 
> script_field. I don't want to use filter_script though because I am using 
> facets and I want my records to filter out for facets too.
>
> Do you know if can extends any class or any any plugin or anything to 
> filter my records based on the script field.
>
>
> On Sunday, July 7, 2013 1:21:38 PM UTC-4, Oreno wrote:
>>
>> Hi Alex,
>> 1.I checked the cash solution but its taking 15 times more then my 
>> starting time (10s against 150s), so that will be a problem since my filter 
>> has dynamic params. 
>> It does go fast once it's stored though. Do you know if it's  possible to 
>> do some kind of cashing for all source documents for future queries?
>>
>> 2.From what I understand ,both the filter script and the script_field are 
>> suppose to go over each document that results from the prior query.
>> The only thing I can think of that makes the difference is that the 
>> script_filter actually needs to filter the false documents (for the hit 
>> count) while the script_field only
>> needs to add the field for the first 10 document returning by default.
>>
>> I'm trying to figure out how I can speed the response when using source() 
>> on native java script. 
>> I'm assuming the bottle neck is somewhere within creating the response. I 
>> read that using source has some overhead  because elasticsearch has to 
>> parse the json source,
>> but if that was the case here, then I should have received the same big 
>> overhead for both  script_field and  filter script runs.
>>
>> All I actually need is the hit count so if I'm correct about the response 
>> parsing and that can be excluded I'll be really glad.
>>
>> Any idea on the above?
>>
>> Appreciating your help.
>>
>> Oren
>>
>>
>> On Sun, Jul 7, 2013 at 7:13 PM, Alexander Reelsen-2 [via ElasticSearch 
>> Users] <[hidden email] 
>> > wrote:
>>
>>> Hey,
>>>
>>> what kind of query are you executing? Using script fields results in the 
>>> scipt only being executed for each search hit, whereas executing it as a 
>>> script filter it might need to execute for each document in your index (you 
>>> can try to cache the script filter so it might be faster for subsequent 
>>> requests).
>>>
>>> Hope this helps as a start for optimization, if not, please provide some 
>>> more information.
>>>
>>>
>>> --Alex
>>>
>>>
>>> On Sun, Jul 7, 2013 at 2:21 PM, oreno <[hidden email] 
>>> > wrote:
>>>
 Hi, I notice that using a script_fields that returns true or false 
 values is
 going much faster then
 using the same script but with filter script declaration (so it will 
 filter
 the docs returning false).

 I was sure that the  filter script is taking so long because I'm using 
 the
 source().get(...) method, but turns out that when using the same script,
 only with script_fields  instead, I'm receiving the performance I need. 
 the
 only problem here is that I want to filter the docs that now have
 "MessageReverted" = false.

 1.Any way I can filter the docs containing  "MessageReverted" = false 
 ?(some
 wrapper query?)
 2. Any idea way the filter script is taking much longer then the script
 field(8000 mill against 250 mill)?

 both ways are retrieving the source() for the script logic so it can't 
 be a
 matter of source fetching as far as I understand.

 fast:
 ...,
   "script_fields": {
 "MessageReverted": {
   "script": "revert",
   "lang": "native",
   "params": {
 "startDate": "2013-05-1",
 "endDate": "2013-05-1",
 "attributeId": "2365443",
 "segmentId": "2365443"
   }
 }
   }


 slow:
 ...,
   "filter": {
 "script": {
   "script": "revert",
   "lang": "native",
   "params": {
 "startDate": "2013-05-1",
 "endDate": "2013-05-1",
 "attributeId": "2365443",
 "segmentId": "2365443"
   }
 }
   }


 Any idea?

 Thanks in advanced,

 Oren



 --
 View this message in context: 
 http://elasticsearch-users.115913.n3.nabble.com/script-fields-vs-filter-script-tp4037658.html
 Sent from the ElasticSearch Users mailing list archive at Nabble.com.

 --
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to [hidden email]

Filtering on Script Value

2014-06-24 Thread Brian Behling

I'm trying to calculate a value for each hit then select or filter on a 
calculated value. Something like below: 

"query": {
"match_all": {}
},
"script_fields" : {
"counter" : {
"script" : "count++",
"params" : {
"count"  : 1
}
},
"source" : {
  "script" : "_source"
}
  }

I'd like to filter on the count parameter.

I've read on a StackOverflow post that you cannot filter on a script value.

So is there another way to calculate some value dynamically and filter on 
that value? 

If not, is there a nested SQL SELECT equivalent in ElasticSearch? Maybe I 
could execute the first query to calculate the 'count' then execute another 
query to filter by a value?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8206ea84-b314-4b8e-8f3c-248d9f5a99e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Using a pre-indexed shape filter with other properties

2014-06-24 Thread David Nesbitt

My goal is to validate that the geo points of indexed cities are contained 
within the multipolygon boundary of their corresponding country.

I have using the following as an initial guide: 
 
http://people.mozilla.org/~wkahngreene/elastic/guide/reference/query-dsl/geo-shape-filter.html

So the city index looks like:

$ curl -XPUT http://localhost:9200/points -d ' 
   
{
"mappings": {
"city": {
"properties": {
"city": {"type": "string"},
"country": {"type": "string"},
"location": {"type": "geo_shape"}
}
}
}
}
'

$ curl -XPOST http://localhost:9200/points/city/ -d '{"city": "Tokyo", 
"country": "Japan","location": {"type" : "point", "coordinates" : 
[139.6917, 35.6895]}}'

$ curl -XPUT http://localhost:9200/shapes -d '
{
"mappings": {
"country": {
"properties": {
"name": {"type": "string"},
"shape": {"type": "geo_shape"}
}
}
}
}
'

$ curl -XPOST http://localhost:9200/shapes/country/ -d 
'{"name":"Japan","shape":{"type":"multipolygon","coordinates":134.638428,34.149234],[134.766379,33.806335],[134.203416,33.201178],[133.79295,33.521985],[133.280268,33.28957],[133.014858,32.704567],[132.363115,32.989382],[132.371176,33.463642],[132.924373,34.060299],[133.492968,33.944621],[133.904106,34.364931],[134.638428,34.149234]]],[[[140.976388,37.142074],[140.59977,36.343983],[140.774074,35.842877],[140.253279,35.138114],[138.975528,34.6676],[137.217599,34.606286],[135.792983,33.464805],[135.120983,33.849071],[135.079435,34.596545],[133.340316,34.375938],[132.156771,33.904933],[130.986145,33.885761],[132.36,33.149992],[131.33279,31.450355],[130.686318,31.029579],[130.20242,31.418238],[130.447676,32.319475],[129.814692,32.61031],[129.408463,33.296056],[130.353935,33.604151],[130.878451,34.232743],[131.884229,34.749714],[132.617673,35.433393],[134.608301,35.731618],[135.677538,35.527134],[136.723831,37.304984],[137.390612,36.827391],[138.857602,37.827485],[139.426405,38.215962],[140.05479,39.438807],[139.883379,40.563312],[140.305783,41.195005],[141.368973,41.37856],[141.914263,39.991616],[141.884601,39.180865],[140.959489,38.174001],[140.976388,37.142074]]],[[[143.910162,44.1741],[144.613427,43.960883],[145.320825,44.384733],[145.543137,43.262088],[144.059662,42.988358],[143.18385,41.995215],[141.611491,42.678791],[141.067286,41.584594],[139.955106,41.569556],[139.817544,42.563759],[140.312087,43.333273],[141.380549,43.388825],[141.671952,44.772125],[141.967645,45.551483],[143.14287,44.510358],[143.910162,44.1741}}'

I can explicitly query against a particular country as follows:

$ curl -XGET 'http://localhost:9200/points/city/_search?pretty=true' -d '   


{
  "query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_shape" : {
"location" : {
"indexed_shape": {
"id": "odsm2vWgT7ufBLjvmRMrNA",
"type": "country",
"index": "shapes",
"shape_field_name": "shape"
},
"relation": "within"
}
}
}
}
  }
}'

But is there a way that I can query that the city's country property 
matches the country's name property and the city's location is within the 
country's shape?

Thanks for any insights!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/806c4845-24c7-442b-b5fa-d66e0fca68d5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Corss-index parent/child relationship

2014-06-24 Thread Drew Kutcharian

Hi!

Does ES support cross-index parent/child relationship? More specifically, can I 
have all the parents in one index (say users) and the children (say events) in 
a multiple time series style (managed by curator) indices? If so, how is this 
done? If not, what's the alternative?

Thanks,

Drew

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6AA466B5-D408-42D5-BC2A-E54BE30A830C%40venarc.com.
For more options, visit https://groups.google.com/d/optout.

function score query & _all field ?

2014-06-24 Thread Sergey Pilypenko

Hi!

I notised that starting from 1.0.0rc1 _boost option is deprecated. From doc 
I can't imagine how to use suggested solution (function score query) with 
_all field which was taking in acount specific boosts from specific fields.
Could somebody provide some small example for this case?

Thank you in advance,
Sergii

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f2be7fb7-2129-4b68-8e13-b1584112a60b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Upgrade Path from ES 0.20.2 to 0.90.13 and finally to 1.2.1

2014-06-24 Thread moinul . rony

Hi, 

I have a elasticsearch instance running in a Logstash, Kibana environment 
for centralized log viewing.

The version is pretty much outdated and i am looking to upgrade the 
instance. 

When I upgraded from 0.20.0 to 0.90.13 the node were picked up and we saw 
this log

tail -n30 -f /opt/elasticsearch-0.90.13/logs/elasticsearch.log
[2014-06-24 17:50:45,222][INFO ][node ] [Bedlam] 
version[0.90.13], pid[3900], build[249c9c5/2014-03-25T15:27:12Z]
[2014-06-24 17:50:45,222][INFO ][node ] [Bedlam] 
initializing ...
[2014-06-24 17:50:45,231][INFO ][plugins  ] [Bedlam] loaded 
[], sites []
[2014-06-24 17:50:48,786][INFO ][node ] [Bedlam] 
initialized
[2014-06-24 17:50:48,786][INFO ][node ] [Bedlam] 
starting ...
[2014-06-24 17:50:49,069][INFO ][transport] [Bedlam] 
bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/1.2.3.4:9300]}
[2014-06-24 17:50:52,288][INFO ][cluster.service  ] [Bedlam] 
new_master [Bedlam][qRWDbBtOQPK-4oWRP-k8XA][inet[/1.2.3.4:9300]], reason: 
zen-disco-join (elected_as_master)
[2014-06-24 17:50:52,339][INFO ][discovery] [Bedlam] 
elasticsearch/qRWDbBtOQPK-4oWRP-k8XA
[2014-06-24 17:50:52,371][INFO ][http ] [Bedlam] 
bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/1.2.3.4:9200]}
[2014-06-24 17:50:52,372][INFO ][node ] [Bedlam] started
[2014-06-24 17:50:53,892][INFO ][gateway  ] [Bedlam] 
recovered [8] indices into cluster_state
[2014-06-24 17:51:58,604][INFO ][cluster.service  ] [Bedlam] added 
{[Shatterstar][UGqUXDjvRxivYHuQeBzLXw][inet[/1.2.3.4:9301]]{client=true, 
data=false},}, reason: zen-disco-receive(join from 
node[[Shatterstar][UGqUXDjvRxivYHuQeBzLXw][inet[/1.2.3.4:9301]]{client=true, 
data=false}])
[2014-06-24 17:51:58,686][WARN ][transport.netty  ] [Bedlam] 
Message not fully read (request) for [5] and action 
[cluster/nodeAliasesUpdated], resetting
[2014-06-24 17:51:58,688][WARN ][transport.netty  ] [Bedlam] 
Message not fully read (request) for [11] and action 
[cluster/nodeIndexCreated], resetting
[2014-06-24 17:51:58,689][WARN ][transport.netty  ] [Bedlam] 
Message not fully read (request) for [7] and action 
[cluster/nodeIndexCreated], resetting
[2014-06-24 17:51:58,691][WARN ][transport.netty  ] [Bedlam] 
Message not fully read (request) for [9] and action 
[cluster/nodeIndexCreated], resetting
[2014-06-24 17:51:58,696][WARN ][transport.netty  ] [Bedlam] 
Message not fully read (request) for [13] and action 
[cluster/nodeIndexCreated], resetting
[2014-06-24 17:51:58,702][WARN ][transport.netty  ] [Bedlam] 
Message not fully read (request) for [8] and action 
[cluster/nodeIndexCreated], resetting
[2014-06-24 17:51:58,701][WARN ][transport.netty  ] [Bedlam] 
Message not fully read (request) for [6] and action 
[cluster/nodeIndexCreated], resetting
[2014-06-24 17:51:58,710][WARN ][transport.netty  ] [Bedlam] 
Message not fully read (request) for [12] and action 
[cluster/nodeIndexCreated], resetting
[2014-06-24 17:51:58,712][WARN ][transport.netty  ] [Bedlam] 
Message not fully read (request) for [10] and action 
[cluster/nodeIndexCreated], resetting
[2014-06-24 17:52:01,633][WARN ][discovery.zen] [Bedlam] 
received a join request for an existing node 
[[Shatterstar][UGqUXDjvRxivYHuQeBzLXw][inet[/1.2.3.4:9301]]{client=true, 
data=false}]
[2014-06-24 17:52:04,666][WARN ][discovery.zen] [Bedlam] 
received a join request for an existing node 
[[Shatterstar][UGqUXDjvRxivYHuQeBzLXw][inet[/1.2.3.4:9301]]{client=true, 
data=false}]
[2014-06-24 17:52:07,687][WARN ][discovery.zen] [Bedlam] 
received a join request for an existing node 
[[Shatterstar][UGqUXDjvRxivYHuQeBzLXw][inet[/1.2.3.4:9301]]{client=true, 
data=false}]
[2014-06-24 17:52:10,700][WARN ][discovery.zen] [Bedlam] 
received a join request for an existing node 
[[Shatterstar][UGqUXDjvRxivYHuQeBzLXw][inet[/1.2.3.4:9301]]{client=true, 
data=false}]
[2014-06-24 17:52:13,738][WARN ][discovery.zen] [Bedlam] 
received a join request for an existing node 
[[Shatterstar][UGqUXDjvRxivYHuQeBzLXw][inet[/1.2.3.4:9301]]{client=true, 
data=false}]
[2014-06-24 17:52:16,752][WARN ][discovery.zen] [Bedlam] 
received a join request for an existing node 
[[Shatterstar][UGqUXDjvRxivYHuQeBzLXw][inet[/1.2.3.4:9301]]{client=true, 
data=false}]
[2014-06-24 17:52:19,762][WARN ][discovery.zen] [Bedlam] 
received a join request for an existing node 
[[Shatterstar][UGqUXDjvRxivYHuQeBzLXw][inet[/1.2.3.4:9301]]{client=true, 
data=false}]
[2014-06-24 17:52:22,788][WARN ][discovery.zen] [Bedlam] 
received a join request for an existing node 
[[Shatterstar][UGqUXDjvRxivYHuQeBzLXw][inet[/1.2.3.4:9301]]{client=true, 
data=false}]
[2014-06-24 17:52:25,812][WARN ][discovery.z

Re: Overloading by indexing?

TPS is usually transactions per second.

Are you monitoring your cluster, and your memory/heap usage? How are you
coming to the conclusion that it's a networking issue?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 25 June 2014 00:33, Julius K  wrote:

> Hi,
>
> I have a strange problem with ES. It's running on a cluster with 64 cores
> etc, so I don't think the power of the hardware is the issue.
>
> I want to index a lot of documents with elasticsearch-hadoop.
> After some problems I now have everything into place and it seems to work
> fine.
>
> So I wrote a simple pig script which loads all the files (~500) and stores
> them into an ES index.
> However, after ~22h the job failed, because of connection problems between
> the nodes.
> But during that time, there wasn't any heavy usage of network bandwidth or
> other ressources.
>
> After that I tried to run the pig script only for one document so I know
> what is indexed and what is missing.
> After about 3 documents indexed well doing this, the jobs started to fail
> again, due to network problems although there wasn't any significant load.
>
> I observed that even after the indexing jobs stopped, there was stuff
> happening with the index. The number of documents kept growing for quite
> some time and the translog operations went up and down being mostly at
> about half a million.
>
> For me this looks like the index takes more time indexing than the pig
> script takes for writing into the index and after some time somewhere a
> buffer gets too full.
>
> Is this possible? I would expect, that in this case elasticsearch-hadoop
> should get throttled.
>
> The only documentation about the translog is what I found here:
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html
> which I find a bit little. I still don't know what implications the number
> of translog operations has.
>
> On the linked page it says, I could increase the numbers when doing bulk
> indexing but I don't understand how this would help.
> Also what's TPS?
>
> Best regards
> Julius
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d17e1231-da99-4bc2-b019-806046ffd34e%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bSM3t2j%3DqJwBe4%3DjhSkd5BSJskoJtCf0Qj7amD3pUa0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES heap size slowly but surely increasing

Yes, but there is still overhead that the cluster needs to store in memory,
the metadata about the data.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 24 June 2014 23:38, Aldian  wrote:

> Yes I am, but... This data should be stored somewhere on the disk, right?
>
>
> 2014-06-24 12:03 GMT+02:00 Mark Walkom :
>
>> Are you indexing new data?
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 24 June 2014 20:01, Aldian  wrote:
>>
>>>  Hi
>>>
>>> I upgraded to 1.2.1 and set memory options to 2GB. Two weeks have passed
>>> now and every day I have been checking heap memory level with Bigdesk.
>>>
>>> At first I was not sure, but now it is clearly increasing little by
>>> little. One day after I restarted ES it grew to approximately 1.3 GB of
>>> used Heap Memory, and now we are at 1.6GB, which make me guess it is
>>> leaking around 150 Mo Heap Memory a week.
>>>
>>> I am the only one experiencing such problem? And do you know what could
>>> cause it?
>>>
>>> Best,
>>>
>>> Aldian
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>>
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/9481c6a9-0690-4ac5-be26-d913e1700c0e%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/eMA_PCZQLss/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAEM624afa-DiX%2B4C48wCUZhhgXA16YkzEwx-_u%2B-v2Xxq7SvfA%40mail.gmail.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Cordialement,
>
> Aldian
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAECUaLw7F-p9Rnz1Bvfr2Pwu1Jzyan0QFi8zp%2Bz-3W_Gi4m_bA%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624aD%3DBCMESSSDjT6Jd7nbmp_1vNrMzXOwSRhB0s39gop-A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

RecoveryFailedException while adding a node during Elasticsearch upgrade from 0.2.0 to 1.2.1

2014-06-24 Thread anurag naidu

We are upgrading our ES server from 0.2.0 to 1.2.1 using the cluster 
restart upgrade steps here 
.
 
We have 2 nodes as part of our cluster. We disabled allocation on both 
nodes before the upgrade and then upgraded the servers. 

However after starting both nodes, when we attempt to re-enable allocation 
it seem like Node2 just doesn't work with shard reallocation and throws the 
following error. The errors just keep happening so we eventually had to 
turn off replication again and now we are getting by with a single node in 
the cluster with no replication. 

Below is the stacktrace of the NullPointerException when trying to enable 
allocation. Any pointers to help with solving this would be greatly 
appreciated. I can provide more details if needed.


[WARN ][indices.cluster  ] [Valentina Allegra de La Fontaine] 
[production_restaurants][2] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException: 
[production_restaurants][2]: Recovery failed from 
[Shadowmage][Ei4dGmkmScmY0WOPN8PR_Q][server][inet[/server_ip:9300]] int
o [Valentina Allegra de La 
Fontaine][LJO_jO59QGuGa-jLmTOZWg][server][inet[/server_ip:9300]]
at 
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:306)
at 
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:65)
at 
org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:175)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.transport.RemoteTransportException: 
[Shadowmage][inet[/172.16.21.21:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: 
[production_restaurants][2] Phase[1] Execution failed
at 
org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:996)
at 
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:631)
at 
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:122)
at 
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:62)
at 
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:351)
at 
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:337)
at 
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: 
org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: 
[production_restaurants][2] Failed to transfer [44] files with total size 
of [904.2mb]
at 
org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:243)
at 
org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:993)
... 9 more
Caused by: java.lang.NullPointerException

Thanks
-anurag

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1788dbdf-7e0c-4f7d-8382-01ad47a15cb7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Snapshot & Restore in a cluster of two nodes

2014-06-24 Thread Daniel Bubenheim

Hi Alex,

thanks for your answer. Here are some more detailed facts:

1. I generate a repository on node1 in cluster1
2. execute snapshot on node1 in cluster1
3. execute restore on node1 in cluster2

the error message is (server-url was edited), index name is "bel-en":

[2014-06-24 23:48:56,197][WARN ][cluster.action.shard ] [coordinator] 
[bel-en][0] received shard failed for [bel-en][0], 
node[e6sRC7OzRnq1XswwYjZ1JQ], [P], restoring[bel-en:2014-06-24_23_48_52], 
s[INITIALIZING], indexUUID [a6dQi6kDTI-xvlyM6NRq8Q], reason [Failed to 
start shard, message [IndexShardGatewayRecoveryException[[bel-en][0] failed 
recovery]; nested: IndexShardRestoreFailedException[[bel-en][0] restore 
failed]; nested: IndexShardRestoreFailedException[[bel-en][0] failed to 
restore snapshot [2014-06-24_23_48_52]]; nested: 
IndexShardRestoreFailedException[[bel-en][0] failed to read shard snapshot 
file]; nested: 
FileNotFoundException[http://{server-url}/bel-en/indices/bel-en/0/snapshot-2014-06-24_23_48_52];
 
]]
[2014-06-24 23:48:56,519][WARN ][cluster.metadata ] [coordinator] 
[bel-en] re-syncing mappings with cluster state for types [[product]]

Directory http://{server-url}/bel-en/indices/bel-en/0 is empty.

JVM-Version is the same on all nodes.

Elasticsearch runs in Version 1.1.1

Write access should be there, we registered the fs-repository on both nodes 
of the cluster, pointing to a directory on of them.

Thanks for your help.

Daniel


Am Freitag, 20. Juni 2014 10:12:49 UTC+2 schrieb Alexander Reelsen:
>
> Hey,
>
> can you be more precise and create a fully fledged example (generating the 
> repository, executing the snapshot on cluster one, executing restore on 
> cluster 2, etc) and include the concrete error message in order to find out 
> what 'the process breaks' means here? Also provide info about elasticsearch 
> and jvm versions. Thanks!
>
> Snapshots are always done per index (the primary shards) and not per node, 
> so there must be something else going on.
> Is it possible that only one node has write access to the repository?
>
>
> --Alex
>
>
> On Thu, Jun 19, 2014 at 3:36 PM, Daniel Bubenheim <
> daniel.b...@googlemail.com > wrote:
>
>> Hello,
>>
>> we have a cluster of two nodes. Every index in this cluster consists of 2 
>> shards and one replica. We want to make use of  snapshots & restore to 
>> transfer data between two clusters. When we make our snapshots on node one 
>> only the primary shard is included, the replica shard is missing. While 
>> restoring on the other cluster the process breaks because of the missing 
>> second shard. 
>> Do we have to make a snapshot for each node to include both primary 
>> shards so that we can restore the whole index or am i missing something 
>> here? 
>>
>> Thanks in advance
>> Daniel
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/fb1b3a48-250c-46bc-9a4a-8a9ccd582164%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1f37e455-b639-47f6-9cab-2f7930408f61%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Proper parsing of String values like 1m, 1q HOUR etc.

2014-06-24 Thread Brian

Thomas,

The TimeValue class handles precisely defined time periods (well, pretty 
much, anyway). In other words, 1s is one second. 1w is always 7d (leap 
seconds notwithstanding, but that doesn't really affect the precision).

But what is one year? 365 days? 365.25 days? 366 days in a leap year?

What is one quarter? Exactly 91.25d (which is 365 / 4)? Or 3 months? 

But then, what is a month? 28 days? 31 days? Use 28d or 31d if that's what 
you mean; 1 month has no deterministic meaning all by itself. And 1 quarter 
is 3 months but without any deterministic way to convert to a precise 
number of milliseconds.

The TimeValue class has no support for locale nor day of year nor leap year 
nor days in a month. It's best to use Joda time if you wish to perform 
proper year-oriented calculations. And it will return milliseconds 
precision if you wish, which will plug directly back into a TimeValue.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/05798106-2a3d-4b7a-8a06-572116e0694b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Priority of pending tasks

2014-06-24 Thread Ashish Nigam

Hi,
Is there any way to update how elasticsearch manages priority in pending
tasks queue?
Here's some background -
My cluster was in yellow state for more than a day and now it is green, But
it seems shards are being started and that also is taking lot of time.

Now, I give delete index request and that's how queue looks -

curl -XGET 'http://10.0.11.120:9200/_cluster/pending_tasks'?pretty=true

{

  "tasks" : [ {

"insert_order" : 60444,

"proirity" : "URGENT",

"source" : "delete-index [index_18530]",

"time_in_queue_millis" : 1740,

"time_in_queue" : "1.7s"

  }, {

"insert_order" : 60443,

"proirity" : "HIGH",

"source" : "shard-started ([index_18227][1],
node[t8aivPciQwSqdgJYj1iY2w], relocating [H52VhU9pQRuS2Mp-YtcD7Q], [P],
s[INITIALIZING]), reason [after recovery (replica) from node
[[esearch08][H52VhU9pQRuS2Mp-YtcD7Q][inet[/10.0.11.128:9300]]{master=false,
zone=zone_spop-sjc}]]",

"time_in_queue_millis" : 39494,

"time_in_queue" : "39.4s"

  } ]



As you see, my delete request has propriety of URGENT.

But after 30 seconds this request times out

{"error":"RemoteTransportException[[esearch00][inet[/10.0.11.120:9300]][indices/delete]];
nested: ProcessClusterEventTimeoutException[failed to process cluster event
(delete-index [index_18530]) within 30s]; ","status":503}



It seems that URGENT request timed-out because of other HIGH priority
requests.


Is there any way to execute delete request successfully even if shards of
other index are being moved around or started?


Thanks

Ashish

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CANsfGR%3DE0rPinu3GiaawbmDt4MNLCGUrf%2BLyQcUK4s8j6Utndw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

scripted partial delete

2014-06-24 Thread eunever32

Hello

can person confirm if script can delete an element of an array


I assume it's not possible

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cd4fd4d5-98ea-4991-bb99-9c49038115ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

MongoDB River Reindex

2014-06-24 Thread Travis sturzl

This morning we had an issue with Elastic Search on one of our production 
servers. We had to reindex the entire thing. It appeared in several posts 

 
I've seen from various sources that the only real way to do this was remove 
the index and recreate the river.

So I did exactly that. I removed the index, then I tried recreating the 
river. I'm not getting this error:

{"error":"UnavailableShardsException[[_river][0] [2] shardIt, [0] active : 
> Timeout waiting for [1m], request: index {[_river][adn_demo][_meta], 
> source[\n{\n  \"type\": \"mongodb\",\n  \"mongodb\": {\n\"servers\": 
> [\n  { \"host\": \"10.128.3.250\", \"port\": 27017 },\n  { 
> \"host\": \"10.128.4.5\", \"port\": 27017 },\n  { \"host\": 
> \"10.128.4.6\", \"port\": 27017 }\n],\n\"options\": { 
> \"secondary_read_preference\": true },\n\"db\": \"adn_demo\",\n   
>  \"collection\": \"jobPosts\"\n  },\n  \"index\": {\n\"name\": 
> \"job_posts_index_prd\",\n\"type\": \"job_post_prd\"\n 
>  }\n}]}]","status":503}



I'm under the impression that I'm not part of a replica set as 
"localhost:9200/_nodes/process?pretty" outputs:

> {
>   "ok" : true,
>   "cluster_name" : "adnElasticsearch",
>   "nodes" : {
> "oDBNxMd9QZ-vxnE7PoWFig" : {
>   "name" : "Professor X",
>   "transport_address" : "inet[web-api1]",
>   "hostname" : "web-api1",
>   "version" : "0.90.3",
>   "http_address" : "inet[web-api1",
>   "process" : {
> "refresh_interval" : 1000,
> "id" : 23372,
> "max_file_descriptors" : 65535
>   }
> }
>   }
> }


However we duplicate ES to provide redundancy, so the 2 other servers are 
configured like this:

> {
>   "ok" : true,
>   "cluster_name" : "adnElasticsearch",
>   "nodes" : {
> "aQS4U9axRryqnVN7RqFnWw" : {
>   "name" : "Anomaloco",
>   "transport_address" : "inet[/10.128.4.10:9300]",
>   "hostname" : "web-api3",
>   "version" : "0.90.3",
>   "http_address" : "inet[web-api3/10.128.4.10:9200]",
>   "process" : {
> "refresh_interval" : 1000,
> "id" : 25224,
> "max_file_descriptors" : 65535
>   }
> },
> "NsKtV7UiSx2YPSJHGBs8RQ" : {
>   "name" : "Bullseye",
>   "transport_address" : "inet[web-api2/10.128.4.9:9300]",
>   "hostname" : "web-api2",
>   "version" : "0.90.3",
>   "http_address" : "inet[web-api2/10.128.4.9:9200]",
>   "process" : {
> "refresh_interval" : 1000,
> "id" : 804,
> "max_file_descriptors" : 65535
>   }
> }
>   }
> }


I assume web-api1 got removed from the replica set when I removed the index.

I had been kind of dropped into this system when we lost our system 
administrator. I'm not fluent in ElasticSearch, and I'm having troubles 
finding out how to resolve this issue.

How do I recreate the index/river?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/129e8a55-5e31-4f2f-adec-f6b306128a5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Changing Kibana-int based on context

2014-06-24 Thread mysterydark

That indeed is the reply I got from most of the sources but somehow I have 
not been asked to adopt that method. What I am trying to do now is to 
change the config.js file. I have written a function that receives query 
parameters from the url but I have to assign it to kibana-index so that it 
creates a new index by that name and does not use the default kibana-int. I 
have no idea how to access kibana-index in javascript. Its defined in a 
function Settings . This is my function below.

 var proj_type = getProjectType("prjtype");

function getProjectType(type)
{
  var query = window.location.search.substring(1);
  var variables = query.split("&");
  for (var i = 0; i
> I am a newbie to Computer science in general and at present I am working 
> on a project which involves Elasticsearch, logstash, and Kibana and we are 
> using this to build up a centralized Logging system. In kibana config.js , 
> there is a parameter kibana_index whose default value is set to 
> "Kibana-int". Is there a way possible to change the value of Kibana-index 
> based on the context? What I could understand from my research is that 
> "kibana-int" is the index which stores all the dashboards. When I say 
> context, what I mean is if I have multiple projects in an organization, the 
> dropdown on kibana dashboard page should show the dashboards only under a 
> particular project when I give that project's name as the context in my 
> url. So people working in a project get to see only the ones in their 
> project. The only way I could find is to change the kibana-index value 
> based on the project say something like "kibana-projA". So it shows all the 
> dashboards under this particular index. But I couldnt find a way as to how 
> to do it. Could you please help me out.
>
> Any help would be appreciated.
>
> Thanks.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eeb9819a-c7b2-4f6f-bd2b-e8dc338f2371%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Question on adding new node to ES cluster

Elasticsearch will be able to serve all your requests during that time.

You will probably see some IO increasing but IIRC it will be bounded to 20 mb/s 
by default.

HTH

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 24 juin 2014 à 18:49, karthik jayanthi  a 
> écrit :
> 
> Hi,
> 
> I am trying to understand the process of adding a new node to an existing ES 
> cluster. Specifically I wanted to understand during the shard allocation that 
> happens as a part of adding a new node, will the cluster overall be available 
> for serving new requests - specifically search requests ? Or can we expect 
> any performance impacts during the same ? 
> 
> 
> 
> Thanks,
> Karthik
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/3051fa60-4b93-4b22-9c8e-83cc6aa3afd7%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7C7C1AE7-89FF-4E25-9B62-70D7EE4D2E70%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

elasticsearch 1.2.1 plugin

2014-06-24 Thread sri

Hello,

I need to write a plugin for elasticsearch 1.2.1 but i was not able to fine 
proper documentation or reference for that, i would be very grateful if 
anyone would be able to guide me with the same.

I have referred to the plugins of previous ES version but there are changes 
in syntax between the version.

Thanks and Regards
Sri

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7a6cbbb5-d830-4c4b-b4d4-3bbd23e0ea28%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Downside to using version=true in search requests?

2014-06-24 Thread Gabe Gorelick-Feldman

Is there a reason that search responses do not include a _version by 
default? I know you can include it by specifying version=true in the search 
params, but I was wondering if there's some reason you wouldn't want to do 
that by default, e.g. because of performance implications.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/86aa9407-54e1-4173-8f69-ced36bd099cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Can you do date histogram with specified number of buckets?

2014-06-24 Thread Tebring Daly

Looks like the histogram aggregation works currently where each bucket is
based on a user defined time interval, therefore you do not know how many
buckets you will be returned unless you know ahead of time how much time
your data spans.

Is there a way to create a date histogram aggregation in ElasticSearch
where I can define the number of buckets that I want and the bucket
increment be determined during execution based on the oldest and newest
matching document?

I would like to be able to say give me 20 buckets.  Thus if the data spans
10 years each bucket is determined a half year, or if it spans 10 minutes
each bucket determined to be 30 seconds.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPPbrSSJmC4rwfpaOYpygEpA_cjuaTXLLSyVaS4WFRGJEkSLDA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hash tag field analyzer not being applied

2014-06-24 Thread André Morais

Hello! 

I have configured an analyzer in my YML to exclude all words that do 
not start with either # or @ to process hash tags and at tags. This 
analyzer works fine if using the analyzer API but when I index data it is 
not being applied. 

I thought that when indexing, analyzers would replace the original 
contents with the analysis result. Is that not so?

Thank you for your help!

  André

-

Here is my YML configuration:

index.analysis.analyzer.tags:
type: custom
tokenizer: whitespace
filter: fntags, fnsize
index.analysis.filter.fntags:
   type : pattern_replace 
   pattern: "^[^#@]+.*$"
   replacement: ""
index.analysis.filter.fnsize:
   type : length
   min : 2
   max : 200

  Here is my type mapping for the field:

Hash_Tags: {
analyzer: tags
type: string
}

  Here is the result when using the analyzer API:

curl -XGET 'localhost:9200/catalog/_analyze?field=Hash_Tags&pretty=true' -d 
'NO a la violencia! Comparta esto en la medida de lo posible si espera 
verdaderamente un mejor mundo para Navidad... y despus! #lifeworthbetter'
{
  "tokens" : [ {
"token" : "#lifeworthbetter",
"start_offset" : 126,
"end_offset" : 142,
"type" : "word",
"position" : 23
  } ]
}

   And here the result for a match_all query:

Hash_Tags: " NO a la violencia! Comparta esto en la medida de lo posible si 
espera verdaderamente un mejor mundo para Navidad... y después! 
#lifeworthbetter"

   I was expecting:
Hash_Tags: "#lifeworthbetter"

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b719f38f-c8a0-439f-9cdc-0bf3f709429d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Jespen article reaction

2014-06-24 Thread Ivan Brusic

Considering that most of the talking points come directly from the
community, I do not think there will be much of a reaction here. The main
issue referenced in the article has a few of the answers to your questions.

Cheers,

Ivan


On Tue, Jun 24, 2014 at 9:04 AM, John Smith 
wrote:

> I was wondering what reaction the community had to this article:
>
> http://aphyr.com/posts/317-call-me-maybe-elasticsearch
>
> I would be interested in a response from knowledgeable users/developers.
>
> After reading this article:
>
> I wanted to know
>
> 1)If the issues brought up in this article are valid and how likely you
> are to encounter them in production?
> 2) If they are valid, how can you minimize them with the current code
> bases?
> 3) What is being done in the short/medium/long term to address these issue
> ? Are there any particular issues we can follow to track progress.
>
> TIA
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/781a651e-16db-4bfd-a4ab-f11fe4b97bd4%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCMbpfKBsbVH8SqX0pQE4M50th008NObGJp%2Bw%3DXvJ9AKw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Secondary sort on aggregation buckets

2014-06-24 Thread Ivan Brusic

I started investigating switching from facets to aggregations in order to
have access to some of the new features aggregations offer.

One of them is the ability to sort on a sub-aggregation metric, which is
working well, but the buckets that are tied come back in a random order.
Since it is not possible to sort the buckets of a stats aggregation (the
sub aggregation), there is no explicit secondary order. Is there any way to
setup the sub aggregations to be returned in order? Default order (count)
is fine.

The gist of the aggregations is

   "aggs": {
  "myagg": {
 "terms": {
"field": "myfield",
"order": {
   "maxotherfield": "desc"
},
"size": 100
 },
 "aggs": {
"maxotherfield": {
   "max": {
  "field": "someotherfield"
   }
}
 }
  }
   }

Haven't looked at the code much, but there is definitely no direct support
for secondary sorts.

Cheers,

Ivan

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDr39Fys73U%3Daz1ry4k2JRWMJb7QmFJvQBAXrE8DYGixg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Jespen article reaction

2014-06-24 Thread Swen Thümmler

Am Dienstag, 24. Juni 2014 18:04:38 UTC+2 schrieb John Smith:
>
> I was wondering what reaction the community had to this article:
>
> http://aphyr.com/posts/317-call-me-maybe-elasticsearch
>
> I would be interested in a response from knowledgeable users/developers.
>
> After reading this article:
>
> I wanted to know 
>
> 1)If the issues brought up in this article are valid and how likely you 
> are to encounter them in production?
> 2) If they are valid, how can you minimize them with the current code 
> bases?
> 3) What is being done in the short/medium/long term to address these issue 
> ? Are there any particular issues we can follow to track progress.
>
>
Great article. I can only comment on 1) and 2):
(1) I've repeatedly encountered split brains despite having 3 servers and 
discovery.zen.minimum_master_nodes=2.
(2) After being fed up with this, I decided to give elasticsearch-zookeeper 
a try. Since I wanted to use it on  1.2.1 and with zookeeper 3.4.6 I have 
forked imotov's repo and created releases for elasticsearch 1.1.3 and 1.2.1 
- you can find it on github 
. Since then, no more 
split brains have occured.
There is another discovery plugin (eskka ) 
which according to the comments in the article might be more robust.

--Swen

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5198d222-942a-44b6-b489-a8998ddb10ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk inserting is slow

Wow thanks a LOT, Cedric and Jörg !

Got down to 15,8 seconds for 264000 documents.

Bulk processing took 15.863 seconds
Import CSV file took 15.874 secondes

If you have any tips to tune it, i take too :)

For example i didn't use the MultiGetRequestBuilder, just a new 
IndexRequest for each doc.
Would it help to use the MultiGet ? Can't really figure out how to use it.

Le mardi 24 juin 2014 17:48:43 UTC+2, Cédric Hourcade a écrit :
>
> Hello, 
>
> You can use the BulkProcessor class to do the work for you: 
>
> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkProcessor.java
>  
>
> Just configure/instantiate the class and .add() your index requests. 
> See: 
> https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/action/bulk/BulkProcessorTests.java
>  
>
> Cédric Hourcade 
> c...@wal.fr  
>
>
> On Tue, Jun 24, 2014 at 5:34 PM, Frederic Esnault 
> > wrote: 
> > Hi again, 
> > 
> > any idea about how to parallelize the bulk insert process ? 
> > I tried creating 4 BulkInserters extending RecursiveAction and executed 
> them 
> > all, but the result is awful, 3 of them finished very slowly, and one 
> did 
> > not finish (don't know why), and got only 70K docs in ES instead of 265 
> > 000... 
> > 
> > The result of downsizing the batches sizes to 10 000 is not really big, 
> > total process took approx. 1 second less (Actually this is much lower 
> than 
> > in the previous post, because i moved the importing UI to  my server, 
> close 
> > to one of ES nodes). Was more than 29 seconds, now 28. 
> > 28 seconds. 
> > 
> > 
> > Import CSV file took 28.069 secondes 
> > 
> > Here is the insertion code. The Iterator is a CSV reading iterator who 
> > parses lines and returns Record instances (object with generic object 
> > values, indexed as string). MAX_RECORDS is my batch size,  set to 10 
> 000. 
> > 
> > public void insert(Iterator recordsIterator) { 
> > while (recordsIterator.hasNext()) { 
> > batchInsert(recordsIterator, MAX_RECORDS); 
> > } 
> > } 
> > 
> > private void batchInsert(Iterator recordsIterator, int 
> limit) { 
> > BulkRequestBuilder bulkRequest = client.prepareBulk(); 
> > int processed = 0; 
> > try { 
> > logger.log(Level.INFO, "Adding records to bulk insert 
> batch"); 
> > while (recordsIterator.hasNext() && processed < limit) { 
> > processed++; 
> > Record record = recordsIterator.next(); 
> > IndexRequestBuilder builder = 
> > client.prepareIndex(datasetName, RECORD); 
> > XContentBuilder data = jsonBuilder(); 
> > data.startObject(); 
> > for (ColumnMetadata column : 
> > dataset.getMetadata().getColumns()) { 
> > Object value = 
> > record.getCell(column.getName()).getValue(); 
> > if (value == null || (value instanceof String && 
> > value.equals("NULL"))) { 
> > value = null; 
> > } 
> > data.field(column.getNormalizedName(), value); 
> > } 
> > data.endObject(); 
> > builder.setSource(data); 
> > bulkRequest.add(builder); 
> > } 
> > logger.log(Level.INFO, "Added "+ 
> bulkRequest.numberOfActions() 
> > +" records to bulk insert batch. Inserting batch..."); 
> > long current = System.currentTimeMillis(); 
> > BulkResponse bulkResponse = 
> > 
> bulkRequest.setConsistencyLevel(WriteConsistencyLevel.ONE).execute().actionGet();
>  
>
> > if (bulkResponse.hasFailures()) { 
> > logger.log(Level.SEVERE, "Could not index : " + 
> > bulkResponse.buildFailureMessage()); 
> > } 
> > System.out 
> > .println(String.format("Bulk insert took %s 
> secondes", 
> > NumberUtils 
> > .formatSeconds(((double) 
> > (System.currentTimeMillis() - current)) / 1000.0))); 
> > } catch (Exception e) { 
> > e.printStackTrace(); 
> > } 
> > } 
> > 
> > Le mardi 24 juin 2014 13:44:03 UTC+2, Frederic Esnault a écrit : 
> >> 
> >> Thanks for all this. 
> >> 
> >> I changed my conf, removed all the thread pool config, reduced refresh 
> >> time to 5s according to Michael advice, and limited my batch to 10 000. 
> >> I'll see how it works then i'll paralellize the bulk insert. 
> >> I'll tell you how it ends up. 
> >> 
> >> Thanks again ! 
> >> 
> >> Le lundi 23 juin 2014 12:56:14 UTC+2, Jörg Prante a écrit : 
> >>> 
> >>> Your bulk insert size is too large. It makes no sense to insert 
> 100.000 
> >>> with one request. Use 1000-1 instead. 
> >>> 
> >>> Also you should submit bulk requests in parallel and not sequential 
> like 
> >>> you do. Sequential bulk is slow if client CPU/networ

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-24 Thread Cindy Hsin

Looks like the memory usage increased a lot with 10k fields with these two 
parameter disabled.

Based on the experiment we have done, looks like ES have abnormal memory 
usage and performance degradation when number of fields are large (ie. 
10k). Where Solr memory usage and performance remains for the large number 
fields. 

If we are only looking at 10k fields scenario, is there a way for ES to 
make the ingest performance better (perhaps via a bug fix)? Looking at the 
performance number, I think this abnormal memory usage & performance drop 
is most likely a bug in ES layer. If this is not technically feasible then 
we'll report back that we have checked with ES experts and confirmed that 
there is no way for ES to provide a fix to address this issue. The solution 
Mike suggestion sounds like a workaround (ie combine multiple fields into 
one field to reduce the large number of fields). I can run it by our team 
but not sure if this will fly.

I have also asked Maco to do one more benchmark (where search and ingest 
runs concurrently) for both ES and Solr to check whether there is any 
performance degradation for Solr when search and ingest happens 
concurrently. I think this is one point that Mike mentioned, right? Even 
with Solr, you think we will hit some performance issue with large fields 
when ingest and query runs concurrently.

Thanks!
Cindy

On Thursday, June 12, 2014 10:57:23 PM UTC-7, Maco Ma wrote:
>
> I try to measure the performance of ingesting the documents having lots of 
> fields.
>
>
> The latest elasticsearch 1.2.1:
> Total docs count: 10k (a small set definitely)
> ES_HEAP_SIZE: 48G
> settings:
>
> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA","number_of_replicas":"0","translog":{"disable_flush":"true"},"number_of_shards":"5","refresh_interval":"-1","version":{"created":"1020199"}
>
> mappings:
>
> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"mapping":{"store":false,"norms":{"enabled":false},"type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"store":false,"type":"integer"},"match":"*_i"}}],"_source":{"enabled":false},"properties":{}
>
> All fields in the documents mach the templates in the mappings.
>
> Since I disabled the flush & refresh, I submitted the flush command (along 
> with optimize command after it) in the client program every 10 seconds. (I 
> tried the another interval 10mins and got the similar results)
>
> Scenario 0 - 10k docs have 1000 different fields:
> Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the used 
> heap memory).
>
>
> Scenario 1 - 10k docs have 10k different fields(10 times fields compared 
> with scenario0):
> This time ingestion took 29 secs.   Only 5.74G heap mem is used.
>
> Not sure why the performance degrades sharply.
>
> If I try to ingest the docs having 100k different fields, it will take 17 
> mins 44 secs.  We only have 10k docs totally and not sure why ES perform so 
> badly. 
>
> Anyone can give suggestion to improve the performance?
>
>
>
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/06d319c4-ee7a-40e3-b11a-6e0adff2c686%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Searching by nested fields

2014-06-24 Thread Danylo Vivchar



I have a trouble searching by nested field.

For example my document is:

{
"chat": [
{
"messages": [
{
"id": "61",
"text": "some text here"
},
{
"id": "62",
"text": "some62 text62 here62"
},
{
"id": "63",
"text": "some63 text63 here63"
},
{
"id": "64",
"text": "some64 text64 here64"
}
]
}
]
}

I'm searching through messages.text. If I search

"text": "some"

{
"query" : {
"nested": {
"path": "messages",
"query": {
"bool": {
"must": [{
"match": {
"text": "some"
}
}]
}
}
}
}
}

I want output something like

{
"chat": [
{
"messages": [
{
"id": "61",
"text": "some text here"
}
]
}
]
}

and not the whole document. I suppose I should use nested property, so 
messages in chat is mapped as "nested". Please anyone help me.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0ade073a-38b1-4384-b56b-623ca92de033%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Reduce threads used by elasticsearch

2014-06-24 Thread jnortey

Hmm, setting  transport.netty.worker_count didn't work for me... For the 
record, I'm running elasticsearch through the java API, so I don't think 
I'm actually using netty. Is there another flag for this property?

On Monday, June 23, 2014 5:29:28 PM UTC-5, Jörg Prante wrote:
>
> You can reduce netty workers by transport.netty.worker_count  setting 
> which is by default set to  2 * CPU cores
>
> Jörg
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/65ba1a42-f2fa-4791-9805-e4032de51087%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Question on adding new node to ES cluster

2014-06-24 Thread karthik jayanthi

Hi,

I am trying to understand the process of adding a new node to an existing 
ES cluster. Specifically I wanted to understand during the shard allocation 
that happens as a part of adding a new node, will the cluster overall be 
available for serving new requests - specifically search requests ? Or can 
we expect any performance impacts during the same ? 



Thanks,
Karthik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3051fa60-4b93-4b22-9c8e-83cc6aa3afd7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Jespen article reaction

2014-06-24 Thread Itamar Syn-Hershko

1. They are valid, and you are likely enough to encounter some of them
every now and then
.
Depending on your configuration this may be very rare, but its very
probable.

2. The good people at Elasticsearch are working on fixing many of those
issues. Some have already been fixed, some will be fixed, but it may take
time until the implementation is air tight.

Some of the recommendations on how to minimize the risks are outlined in
aphyr's post. The general idea is to have dedicated master nodes that are
not data nodes (to minimize the risk of partitions due to GC or similar),
and to have configs setup as recommended by ES in terms of quorum sizes,
ping timeouts etc.

Also make sure to always use a recent version of ES.

3. Most of the work happens here:
https://github.com/elasticsearch/elasticsearch/tree/feature/improve_zen and
there may be other related tickets as well

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 

On Tue, Jun 24, 2014 at 7:04 PM, John Smith 
wrote:

> I was wondering what reaction the community had to this article:
>
> http://aphyr.com/posts/317-call-me-maybe-elasticsearch
>
> I would be interested in a response from knowledgeable users/developers.
>
> After reading this article:
>
> I wanted to know
>
> 1)If the issues brought up in this article are valid and how likely you
> are to encounter them in production?
> 2) If they are valid, how can you minimize them with the current code
> bases?
> 3) What is being done in the short/medium/long term to address these issue
> ? Are there any particular issues we can follow to track progress.
>
> TIA
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/781a651e-16db-4bfd-a4ab-f11fe4b97bd4%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZtHc3Rmrc8mHG%3DJvzKn4w-ON1gSBW3vc3S4B1FZfRPA3A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Reduce threads used by elasticsearch

2014-06-24 Thread jnortey

Perfect, thanks. Is there a comprehensive list of similar properties? I 
didn't find that property here: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4f03cac4-15db-4bbd-8415-433439c866ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Reduce threads used by elasticsearch

2014-06-24 Thread jnortey

Perfect, that's just what I needed. Is there a comprehensive list of these 
properties? I didn't see that property mentioned here: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html

On Monday, June 23, 2014 5:29:28 PM UTC-5, Jörg Prante wrote:
>
> You can reduce netty workers by transport.netty.worker_count  setting 
> which is by default set to  2 * CPU cores
>
> Jörg
>
>
> On Mon, Jun 23, 2014 at 10:34 PM, jnortey  > wrote:
>
>> We have a development and production offering that uses elasticsearch. In 
>> development, it is not necessary to create many threads and so we're trying 
>> to use as few threads as possible. I've been able to make some great 
>> reductions using the threadpool settings, but there seems to be one that I 
>> can't reduce:
>>
>> elasticsearch[Kubik][*http_server_worker*][T#16]{New I/O worker #22}
>>
>> What is the "*http_server_worker"* threadpool? And is there a way to 
>> reduce how many of them are created? Right now there are 16 of them being 
>> created and I don't think that many will be needed for our purposes.
>>
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/fe2a2d17-0c85-441b-858e-57242bfa0524%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2f25f362-0cdc-46d9-914e-2bf457820528%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jespen article reaction

2014-06-24 Thread John Smith

I was wondering what reaction the community had to this article:

http://aphyr.com/posts/317-call-me-maybe-elasticsearch

I would be interested in a response from knowledgeable users/developers.

After reading this article:

I wanted to know 

1)If the issues brought up in this article are valid and how likely you are 
to encounter them in production?
2) If they are valid, how can you minimize them with the current code bases?
3) What is being done in the short/medium/long term to address these issue 
? Are there any particular issues we can follow to track progress.

TIA

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/781a651e-16db-4bfd-a4ab-f11fe4b97bd4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk inserting is slow

Thanks to both of you, i'll look at this immediately !

Le mardi 24 juin 2014 17:51:04 UTC+2, Jörg Prante a écrit :
>
> You should use the org.elasticsearch.action.bulk.BulkProcessor helper 
> class for concurrent bulk indexing.
>
> Jörg
>
>
> On Tue, Jun 24, 2014 at 5:34 PM, Frederic Esnault  > wrote:
>
>> Hi again,
>>
>> any idea about how to parallelize the bulk insert process ?
>> I tried creating 4 BulkInserters extending RecursiveAction and executed 
>> them all, but the result is awful, 3 of them finished very slowly, and one 
>> did not finish (don't know why), and got only 70K docs in ES instead of 265 
>> 000...
>>
>> The result of downsizing the batches sizes to 10 000 is not really big, 
>> total process took approx. 1 second less (Actually this is much lower than 
>> in the previous post, because i moved the importing UI to  my server, close 
>> to one of ES nodes). Was more than 29 seconds, now 28.
>> 28 seconds.
>>
>>
>> *Import CSV file took 28.069 secondes*
>>
>> Here is the insertion code. The Iterator is a CSV reading iterator who 
>> parses lines and returns Record instances (object with generic object 
>> values, indexed as string). MAX_RECORDS is my batch size,  set to 10 000.
>>
>> public void insert(Iterator recordsIterator) {
>> while (recordsIterator.hasNext()) {
>> batchInsert(recordsIterator, MAX_RECORDS);
>> }
>> }
>>
>> private void batchInsert(Iterator recordsIterator, int limit) 
>> {
>> BulkRequestBuilder bulkRequest = client.prepareBulk();
>> int processed = 0;
>> try {
>> logger.log(Level.INFO, "Adding records to bulk insert batch");
>> while (recordsIterator.hasNext() && processed < limit) {
>> processed++;
>> Record record = recordsIterator.next();
>> IndexRequestBuilder builder = 
>> client.prepareIndex(datasetName, RECORD);
>> XContentBuilder data = jsonBuilder();
>> data.startObject();
>> for (ColumnMetadata column : 
>> dataset.getMetadata().getColumns()) {
>> Object value = 
>> record.getCell(column.getName()).getValue();
>> if (value == null || (value instanceof String && 
>> value.equals("NULL"))) {
>> value = null;
>> }
>> data.field(column.getNormalizedName(), value);
>> }
>> data.endObject();
>> builder.setSource(data);
>> bulkRequest.add(builder);
>> }
>> logger.log(Level.INFO, "Added "+ 
>> bulkRequest.numberOfActions() +" records to bulk insert batch. Inserting 
>> batch...");
>> long current = System.currentTimeMillis();
>> BulkResponse bulkResponse = 
>> bulkRequest.setConsistencyLevel(WriteConsistencyLevel.ONE).execute().actionGet();
>> if (bulkResponse.hasFailures()) {
>> logger.log(Level.SEVERE, "Could not index : " + 
>> bulkResponse.buildFailureMessage());
>> }
>> System.out
>> .println(String.format("Bulk insert took %s 
>> secondes", NumberUtils
>> .formatSeconds(((double) 
>> (System.currentTimeMillis() - current)) / 1000.0)));
>> } catch (Exception e) {
>> e.printStackTrace();
>> }
>> }
>>
>> Le mardi 24 juin 2014 13:44:03 UTC+2, Frederic Esnault a écrit :
>>
>>> Thanks for all this.
>>>
>>> I changed my conf, removed all the thread pool config, reduced refresh 
>>> time to 5s according to Michael advice, and limited my batch to 10 000.
>>> I'll see how it works then i'll paralellize the bulk insert.
>>> I'll tell you how it ends up.
>>>
>>> Thanks again !
>>>
>>> Le lundi 23 juin 2014 12:56:14 UTC+2, Jörg Prante a écrit :

 Your bulk insert size is too large. It makes no sense to insert 100.000 
 with one request. Use 1000-1 instead.

 Also you should submit bulk requests in parallel and not sequential 
 like you do. Sequential bulk is slow if client CPU/network is not 
 saturated.

 Check if you have disabled the index refresh from 1 (1s) to -1 while 
 bulk indexing is active. 30s makes not much sense if you can execute the 
 bulk in this time.

 Do not limit indexing memory to 50%.

 It makes no sense to increase queue_size for bulk thread pool to 1000. 
 This means you want a single ES node should accept 1000 x 10 = 100 000 
 000 = 100m docs at once. This will simply exceeds all reasonable limits 
 and 
 bring the node down with an OOM (if you really have 100m docs).

 More advice is possible if you can show your client code how you push 
 docs to ES.

 Jörg



 On Mon, Jun 23, 2014 at 12:30 PM, Frederic Esnault <
 esnault@gmail.com> wrote:

> Hi everyone,
>
> I'm inserting aroun

Re: Bulk inserting is slow

2014-06-24 Thread joergpra...@gmail.com

You should use the org.elasticsearch.action.bulk.BulkProcessor helper class
for concurrent bulk indexing.

Jörg


On Tue, Jun 24, 2014 at 5:34 PM, Frederic Esnault <
esnault.frede...@gmail.com> wrote:

> Hi again,
>
> any idea about how to parallelize the bulk insert process ?
> I tried creating 4 BulkInserters extending RecursiveAction and executed
> them all, but the result is awful, 3 of them finished very slowly, and one
> did not finish (don't know why), and got only 70K docs in ES instead of 265
> 000...
>
> The result of downsizing the batches sizes to 10 000 is not really big,
> total process took approx. 1 second less (Actually this is much lower than
> in the previous post, because i moved the importing UI to  my server, close
> to one of ES nodes). Was more than 29 seconds, now 28.
> 28 seconds.
>
>
> *Import CSV file took 28.069 secondes*
>
> Here is the insertion code. The Iterator is a CSV reading iterator who
> parses lines and returns Record instances (object with generic object
> values, indexed as string). MAX_RECORDS is my batch size,  set to 10 000.
>
> public void insert(Iterator recordsIterator) {
> while (recordsIterator.hasNext()) {
> batchInsert(recordsIterator, MAX_RECORDS);
> }
> }
>
> private void batchInsert(Iterator recordsIterator, int limit) {
> BulkRequestBuilder bulkRequest = client.prepareBulk();
> int processed = 0;
> try {
> logger.log(Level.INFO, "Adding records to bulk insert batch");
> while (recordsIterator.hasNext() && processed < limit) {
> processed++;
> Record record = recordsIterator.next();
> IndexRequestBuilder builder =
> client.prepareIndex(datasetName, RECORD);
> XContentBuilder data = jsonBuilder();
> data.startObject();
> for (ColumnMetadata column :
> dataset.getMetadata().getColumns()) {
> Object value =
> record.getCell(column.getName()).getValue();
> if (value == null || (value instanceof String &&
> value.equals("NULL"))) {
> value = null;
> }
> data.field(column.getNormalizedName(), value);
> }
> data.endObject();
> builder.setSource(data);
> bulkRequest.add(builder);
> }
> logger.log(Level.INFO, "Added "+ bulkRequest.numberOfActions()
> +" records to bulk insert batch. Inserting batch...");
> long current = System.currentTimeMillis();
> BulkResponse bulkResponse =
> bulkRequest.setConsistencyLevel(WriteConsistencyLevel.ONE).execute().actionGet();
> if (bulkResponse.hasFailures()) {
> logger.log(Level.SEVERE, "Could not index : " +
> bulkResponse.buildFailureMessage());
> }
> System.out
> .println(String.format("Bulk insert took %s secondes",
> NumberUtils
> .formatSeconds(((double)
> (System.currentTimeMillis() - current)) / 1000.0)));
> } catch (Exception e) {
> e.printStackTrace();
> }
> }
>
> Le mardi 24 juin 2014 13:44:03 UTC+2, Frederic Esnault a écrit :
>
>> Thanks for all this.
>>
>> I changed my conf, removed all the thread pool config, reduced refresh
>> time to 5s according to Michael advice, and limited my batch to 10 000.
>> I'll see how it works then i'll paralellize the bulk insert.
>> I'll tell you how it ends up.
>>
>> Thanks again !
>>
>> Le lundi 23 juin 2014 12:56:14 UTC+2, Jörg Prante a écrit :
>>>
>>> Your bulk insert size is too large. It makes no sense to insert 100.000
>>> with one request. Use 1000-1 instead.
>>>
>>> Also you should submit bulk requests in parallel and not sequential like
>>> you do. Sequential bulk is slow if client CPU/network is not saturated.
>>>
>>> Check if you have disabled the index refresh from 1 (1s) to -1 while
>>> bulk indexing is active. 30s makes not much sense if you can execute the
>>> bulk in this time.
>>>
>>> Do not limit indexing memory to 50%.
>>>
>>> It makes no sense to increase queue_size for bulk thread pool to 1000.
>>> This means you want a single ES node should accept 1000 x 10 = 100 000
>>> 000 = 100m docs at once. This will simply exceeds all reasonable limits and
>>> bring the node down with an OOM (if you really have 100m docs).
>>>
>>> More advice is possible if you can show your client code how you push
>>> docs to ES.
>>>
>>> Jörg
>>>
>>>
>>>
>>> On Mon, Jun 23, 2014 at 12:30 PM, Frederic Esnault <
>>> esnault@gmail.com> wrote:
>>>
 Hi everyone,

 I'm inserting around 265 000 documents into an elastic search cluster
 composed of 3 nodes (real servers).
 On two servers i give elastic search 20g of heap, on third one which
 has 64g ram, i set 30g of heap for elastic search.

 I set elastic search configuratio

How would ElasticSearch do it ?

2014-06-24 Thread Malo BENOIST

Hello,

I am new to elastic search and it seems like a very powerful tool.

There is a project I am working on for which I'd like to use Elastic Search 
if it can do what we need.

The concept is simple : we have a lot of objects which have properties and 
we would like to use Elastic Search to create a database to store them and 
browse through the mass of data.
We already have a way to generate JSON files for the initial indexation.
The tricky part is that those objects properties tend to change (not too 
often) and when that happens we would like to generate a new JSON file 
containing only the modified fields (and the id), index it and be able to 
consult the object with its latest properties as well as rewind to get rid 
of recent changes.

So I believe ES can do it, the question is : How ? And what would the 
request look like ?

Thanks for your time

Malo BENOIST

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/43a98657-f3e6-4d9c-8910-a60989edc5db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk inserting is slow

Hello,

You can use the BulkProcessor class to do the work for you:
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkProcessor.java

Just configure/instantiate the class and .add() your index requests.
See: 
https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/action/bulk/BulkProcessorTests.java

Cédric Hourcade
c...@wal.fr


On Tue, Jun 24, 2014 at 5:34 PM, Frederic Esnault
 wrote:
> Hi again,
>
> any idea about how to parallelize the bulk insert process ?
> I tried creating 4 BulkInserters extending RecursiveAction and executed them
> all, but the result is awful, 3 of them finished very slowly, and one did
> not finish (don't know why), and got only 70K docs in ES instead of 265
> 000...
>
> The result of downsizing the batches sizes to 10 000 is not really big,
> total process took approx. 1 second less (Actually this is much lower than
> in the previous post, because i moved the importing UI to  my server, close
> to one of ES nodes). Was more than 29 seconds, now 28.
> 28 seconds.
>
>
> Import CSV file took 28.069 secondes
>
> Here is the insertion code. The Iterator is a CSV reading iterator who
> parses lines and returns Record instances (object with generic object
> values, indexed as string). MAX_RECORDS is my batch size,  set to 10 000.
>
> public void insert(Iterator recordsIterator) {
> while (recordsIterator.hasNext()) {
> batchInsert(recordsIterator, MAX_RECORDS);
> }
> }
>
> private void batchInsert(Iterator recordsIterator, int limit) {
> BulkRequestBuilder bulkRequest = client.prepareBulk();
> int processed = 0;
> try {
> logger.log(Level.INFO, "Adding records to bulk insert batch");
> while (recordsIterator.hasNext() && processed < limit) {
> processed++;
> Record record = recordsIterator.next();
> IndexRequestBuilder builder =
> client.prepareIndex(datasetName, RECORD);
> XContentBuilder data = jsonBuilder();
> data.startObject();
> for (ColumnMetadata column :
> dataset.getMetadata().getColumns()) {
> Object value =
> record.getCell(column.getName()).getValue();
> if (value == null || (value instanceof String &&
> value.equals("NULL"))) {
> value = null;
> }
> data.field(column.getNormalizedName(), value);
> }
> data.endObject();
> builder.setSource(data);
> bulkRequest.add(builder);
> }
> logger.log(Level.INFO, "Added "+ bulkRequest.numberOfActions()
> +" records to bulk insert batch. Inserting batch...");
> long current = System.currentTimeMillis();
> BulkResponse bulkResponse =
> bulkRequest.setConsistencyLevel(WriteConsistencyLevel.ONE).execute().actionGet();
> if (bulkResponse.hasFailures()) {
> logger.log(Level.SEVERE, "Could not index : " +
> bulkResponse.buildFailureMessage());
> }
> System.out
> .println(String.format("Bulk insert took %s secondes",
> NumberUtils
> .formatSeconds(((double)
> (System.currentTimeMillis() - current)) / 1000.0)));
> } catch (Exception e) {
> e.printStackTrace();
> }
> }
>
> Le mardi 24 juin 2014 13:44:03 UTC+2, Frederic Esnault a écrit :
>>
>> Thanks for all this.
>>
>> I changed my conf, removed all the thread pool config, reduced refresh
>> time to 5s according to Michael advice, and limited my batch to 10 000.
>> I'll see how it works then i'll paralellize the bulk insert.
>> I'll tell you how it ends up.
>>
>> Thanks again !
>>
>> Le lundi 23 juin 2014 12:56:14 UTC+2, Jörg Prante a écrit :
>>>
>>> Your bulk insert size is too large. It makes no sense to insert 100.000
>>> with one request. Use 1000-1 instead.
>>>
>>> Also you should submit bulk requests in parallel and not sequential like
>>> you do. Sequential bulk is slow if client CPU/network is not saturated.
>>>
>>> Check if you have disabled the index refresh from 1 (1s) to -1 while bulk
>>> indexing is active. 30s makes not much sense if you can execute the bulk in
>>> this time.
>>>
>>> Do not limit indexing memory to 50%.
>>>
>>> It makes no sense to increase queue_size for bulk thread pool to 1000.
>>> This means you want a single ES node should accept 1000 x 10 = 100 000
>>> 000 = 100m docs at once. This will simply exceeds all reasonable limits and
>>> bring the node down with an OOM (if you really have 100m docs).
>>>
>>> More advice is possible if you can show your client code how you push
>>> docs to ES.
>>>
>>> Jörg
>>>
>>>
>>>
>>> On Mon, Jun 23, 2014 at 12:30 PM, Frederic Esnault
>>>  wrote:

 Hi everyone,

 I'm inserting around 265 000

Re: performance of multi_match

2014-06-24 Thread Stephane Bastian

Hello,

It seems to me that the cross_field does more than the SOLR dismax query. 
To compare the same thing in both ES and Solr, you could run the disMax 
query with Es and start from there
==> 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html

Hope it helps
Stéphane 


On Tuesday, June 24, 2014 5:09:21 PM UTC+2, Christoph Lingg wrote:
>
> Hi!
>
> we're using elasticsearch for an open source geocoder called photon. We're 
> using solr previously but we switched to elasticsearch some time ago and 
> I'am using now multi_match's cross_field 
> 
>  
> query (which is great by the way as it sorts out most problems we had 
> before).
>
> I investigated the performance between both implementation and it turned 
> out that the elasticsearch is about 5 times slower than the solr 
> counterpart. The dataset (100,000,000 documents) is identical and the size 
> of both indices too. On the solr side, I am using an edismax 
> 
>  
> query whilst it is a cross_field 
> 
>  on 
> elasticsearch. Average query time is 120ms vs. 1000s.
>
> I adjusted the number of open file descriptors to 64k, during the 
> benchmark there is (almost) no IO whilst the cpu is very high (> 75%, 12 
> cores). As cross_field is a very recent feature I tried out best_field 
> 
>  as 
> well, but benchmark results weren't better.
>
> Do you have any ideas on how I can dig more into performance issues like 
> this in elasticsearch? Do you have experience with both queries you can 
> share with me?
>
> Thanks for your help!
> Christoph
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ec1e15ad-5e1e-4371-a587-1b34d9b54241%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: performance of multi_match

Hello,

It seems your Elasticsearch query is doing a lot more, there is custom
scoring, some filtering with OR on missing fields, sub queries, more
fields, etc.

Were you doing exactly the same filtering/scoring with Solr?

Can you incremently test and compare your queries performance,
starting with just the multi_match vs edismax, also compare the number
of results. Ensure the cross_fields parameter is acting as you want,
as you have lot of fields with maybe different analyzers.


Cédric Hourcade
c...@wal.fr


On Tue, Jun 24, 2014 at 5:09 PM, Christoph Lingg  wrote:
> Hi!
>
> we're using elasticsearch for an open source geocoder called photon. We're
> using solr previously but we switched to elasticsearch some time ago and
> I'am using now multi_match's cross_field query (which is great by the way as
> it sorts out most problems we had before).
>
> I investigated the performance between both implementation and it turned out
> that the elasticsearch is about 5 times slower than the solr counterpart.
> The dataset (100,000,000 documents) is identical and the size of both
> indices too. On the solr side, I am using an edismax query whilst it is a
> cross_field on elasticsearch. Average query time is 120ms vs. 1000s.
>
> I adjusted the number of open file descriptors to 64k, during the benchmark
> there is (almost) no IO whilst the cpu is very high (> 75%, 12 cores). As
> cross_field is a very recent feature I tried out best_field as well, but
> benchmark results weren't better.
>
> Do you have any ideas on how I can dig more into performance issues like
> this in elasticsearch? Do you have experience with both queries you can
> share with me?
>
> Thanks for your help!
> Christoph
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5bff0274-ea12-4f28-a304-3f0ad691880c%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPMParLX7mwJfPUz6L_VvGbdB9jeQ_5uP1Qy%2B06yM58wTw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk inserting is slow

Hi again,

any idea about how to parallelize the bulk insert process ?
I tried creating 4 BulkInserters extending RecursiveAction and executed 
them all, but the result is awful, 3 of them finished very slowly, and one 
did not finish (don't know why), and got only 70K docs in ES instead of 265 
000...

The result of downsizing the batches sizes to 10 000 is not really big, 
total process took approx. 1 second less (Actually this is much lower than 
in the previous post, because i moved the importing UI to  my server, close 
to one of ES nodes). Was more than 29 seconds, now 28.
28 seconds.


*Import CSV file took 28.069 secondes*

Here is the insertion code. The Iterator is a CSV reading iterator who 
parses lines and returns Record instances (object with generic object 
values, indexed as string). MAX_RECORDS is my batch size,  set to 10 000.

public void insert(Iterator recordsIterator) {
while (recordsIterator.hasNext()) {
batchInsert(recordsIterator, MAX_RECORDS);
}
}

private void batchInsert(Iterator recordsIterator, int limit) {
BulkRequestBuilder bulkRequest = client.prepareBulk();
int processed = 0;
try {
logger.log(Level.INFO, "Adding records to bulk insert batch");
while (recordsIterator.hasNext() && processed < limit) {
processed++;
Record record = recordsIterator.next();
IndexRequestBuilder builder = 
client.prepareIndex(datasetName, RECORD);
XContentBuilder data = jsonBuilder();
data.startObject();
for (ColumnMetadata column : 
dataset.getMetadata().getColumns()) {
Object value = 
record.getCell(column.getName()).getValue();
if (value == null || (value instanceof String && 
value.equals("NULL"))) {
value = null;
}
data.field(column.getNormalizedName(), value);
}
data.endObject();
builder.setSource(data);
bulkRequest.add(builder);
}
logger.log(Level.INFO, "Added "+ bulkRequest.numberOfActions() 
+" records to bulk insert batch. Inserting batch...");
long current = System.currentTimeMillis();
BulkResponse bulkResponse = 
bulkRequest.setConsistencyLevel(WriteConsistencyLevel.ONE).execute().actionGet();
if (bulkResponse.hasFailures()) {
logger.log(Level.SEVERE, "Could not index : " + 
bulkResponse.buildFailureMessage());
}
System.out
.println(String.format("Bulk insert took %s secondes", 
NumberUtils
.formatSeconds(((double) 
(System.currentTimeMillis() - current)) / 1000.0)));
} catch (Exception e) {
e.printStackTrace();
}
}

Le mardi 24 juin 2014 13:44:03 UTC+2, Frederic Esnault a écrit :
>
> Thanks for all this.
>
> I changed my conf, removed all the thread pool config, reduced refresh 
> time to 5s according to Michael advice, and limited my batch to 10 000.
> I'll see how it works then i'll paralellize the bulk insert.
> I'll tell you how it ends up.
>
> Thanks again !
>
> Le lundi 23 juin 2014 12:56:14 UTC+2, Jörg Prante a écrit :
>>
>> Your bulk insert size is too large. It makes no sense to insert 100.000 
>> with one request. Use 1000-1 instead.
>>
>> Also you should submit bulk requests in parallel and not sequential like 
>> you do. Sequential bulk is slow if client CPU/network is not saturated.
>>
>> Check if you have disabled the index refresh from 1 (1s) to -1 while bulk 
>> indexing is active. 30s makes not much sense if you can execute the bulk in 
>> this time.
>>
>> Do not limit indexing memory to 50%.
>>
>> It makes no sense to increase queue_size for bulk thread pool to 1000. 
>> This means you want a single ES node should accept 1000 x 10 = 100 000 
>> 000 = 100m docs at once. This will simply exceeds all reasonable limits and 
>> bring the node down with an OOM (if you really have 100m docs).
>>
>> More advice is possible if you can show your client code how you push 
>> docs to ES.
>>
>> Jörg
>>
>>
>>
>> On Mon, Jun 23, 2014 at 12:30 PM, Frederic Esnault > > wrote:
>>
>>> Hi everyone,
>>>
>>> I'm inserting around 265 000 documents into an elastic search cluster 
>>> composed of 3 nodes (real servers).
>>> On two servers i give elastic search 20g of heap, on third one which has 
>>> 64g ram, i set 30g of heap for elastic search.
>>>
>>> I set elastic search configuration to :
>>>
>>> - 3 shards (1 per server)
>>> - 0 replicas
>>> - discovery.zen.ping.multicast.enabled: false (and giving on each node 
>>> the unicast hostnames of the two other nodes);
>>> - and this :
>>>
>>> indices.memory.index_buffer_size: 50%
>>> index.refresh_interval: 30s
>>> threadpool:
>>>   index:
>>> type: fixed
>>> size: 30
>>> queue_size: 1000
>>>   bulk:

Histogram aggregation keys

2014-06-24 Thread Rémi Nonnon

Hi,

I'm working with histogram aggregation but there is something strange with 
keys.
For instance (cf : 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-histogram-aggregation.html):

*If I use this request :*

{
"aggs" : {
"prices" : {
"histogram" : {
"field" : "price",
"interval" : 50
}
}
}
}


*I obtain something like this :*

{
"aggregations": {
"prices" : {
"buckets": [
{

*"key_as_string" : "0",*
"key": 0,
"doc_count": 2
},
{

*"key_as_string" : "50",*

"key": 50,
"doc_count": 4
},
{

*"key_as_string" : "150",*

"key": 150,
"doc_count": 3
}
]
}
}
}


*Instead of :*

{
"aggregations": {
"prices" : {
"buckets": [
{
"key": 0,
"doc_count": 2
},
{
"key": 50,
"doc_count": 4
},
{
"key": 150,
"doc_count": 3
}
]
}
}
}


You could say, it's not important but it generates json ~1/3 bigger...
*Is there a mean to disable this ???*

Moreover, in Elasticsearch Java API, it could be fine to have a method to 
request the response as a hash instead keyed by the buckets keys (cf 
:http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-histogram-aggregation.html#_response_format)

*Thanks!!!*

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/47624c50-32d8-4308-b6e5-b07707ad353d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

performance of multi_match

2014-06-24 Thread Christoph Lingg

Hi!

we're using elasticsearch for an open source geocoder called photon. We're
using solr previously but we switched to elasticsearch some time ago and
I'am using now multi_match's cross_field

query (which is great by the way as it sorts out most problems we had
before).

I investigated the performance between both implementation and it turned
out that the elasticsearch is about 5 times slower than the solr
counterpart. The dataset (100,000,000 documents) is identical and the size
of both indices too. On the solr side, I am using an edismax

query whilst it is a cross_field

on
elasticsearch. Average query time is 120ms vs. 1000s.

I adjusted the number of open file descriptors to 64k, during the benchmark
there is (almost) no IO whilst the cpu is very high (> 75%, 12 cores). As
cross_field is a very recent feature I tried out best_field

as
well, but benchmark results weren't better.

Do you have any ideas on how I can dig more into performance issues like
this in elasticsearch? Do you have experience with both queries you can
share with me?

Thanks for your help!
Christoph

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5bff0274-ea12-4f28-a304-3f0ad691880c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Cannot access parent document field in nested document aggration.

2014-06-24 Thread Viacheslav Chimishuk

Hi, there.

I have a problem with nested documents aggregation. The problem is next. 
Imagine that we have index full of next simple documents

{name: "aaa",
 count: 3,
 companyEmotions: [{
 emotion: 1,
 company: "foo"
 }, {
 emotion: 2,
 company: "bar"
 }]
}

{name: "bbb",
 count: 10,
 companyEmotions: [{
 emotion: 2,
 company: "foo"
 }, {
 emotion: 2,
 company: "bar"
 }]
}



and we want to receive sum aggregation of "count" field grouped by emotion 
for some company. Speaking by our example with two documents we want to get 
answer for the question: "what is the sum of "count" for documents where 
foo company emotion is 1, is 2.
Result buckets should be something like that:

  "buckets" : [ {
"key" : 1,  // This is our emotion
"doc_count" : 1,
"someAggrName" : {
  "value" : 3.0 // sum of all counts.
}
  }, {
"key" : 2,
"doc_count" : 1,
"emotionAggr4" : {
  "value" : 10.0
}
  } ]

And here is the mapping which I'm using.

{
"emotion" : {
"properties" : {
"name" : {"type" : "string", "index" : "not_analyzed" },
"count" : {"type" : "integer" },
"companyEmotions" : {
"type" : "nested",
"_parent": {
"type": "emotion"
},
"properties": {
"company" : {"type" : "string", "index" : 
"not_analyzed" },
"emotion" : { "type" : "integer" }
}
}
}
}
}

I have experimented a lot with next aggregation request, and can't find the 
way how to refer to the "count" field. I have tried _parent.count, 
_doc["_parent.count"].value, etc. Most of the time, if some error is not 
happen, sum is 0. Ho to correctly refer to the parent "count" field in 
script?
Or maybe problem in the mapping or something?

curl -XPOST "http://localhost:9200/emotions/emotion/_search?pretty=true"; -d 
'{
"aggregations": {
"emotionAggr": {
"nested": {
"path": "companyEmotions"
},
"aggregations": {
"emotionAggr2": {
"filter": {
"term": {"companyEmotions.company": "foo"}
},
"aggregations": {
"emotionAggr3": {
"terms": {
"field": "companyEmotions.emotion"
},
"aggregations": {
"emotionAggr4": {
"sum": {"script": 
"doc[\"count\"].value"} // How to reffer _parent.count here?
}
}
}
}
}
}
}
}
}'

Thanks for helping.

P. S.
If you want to experiment here is the list of commands which you can 
execute to create an index full of documents.
https://gist.github.com/vchimishuk/2c324e2be71c23b7f4c7


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2da1f10a-235b-4866-8162-e1bf9846e6c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Overloading by indexing?

2014-06-24 Thread Julius K

Hi,

I have a strange problem with ES. It's running on a cluster with 64 cores
etc, so I don't think the power of the hardware is the issue.

I want to index a lot of documents with elasticsearch-hadoop.
After some problems I now have everything into place and it seems to work
fine.

So I wrote a simple pig script which loads all the files (~500) and stores
them into an ES index.
However, after ~22h the job failed, because of connection problems between
the nodes.
But during that time, there wasn't any heavy usage of network bandwidth or
other ressources.

After that I tried to run the pig script only for one document so I know
what is indexed and what is missing.
After about 3 documents indexed well doing this, the jobs started to fail
again, due to network problems although there wasn't any significant load.

I observed that even after the indexing jobs stopped, there was stuff
happening with the index. The number of documents kept growing for quite
some time and the translog operations went up and down being mostly at
about half a million.

For me this looks like the index takes more time indexing than the pig
script takes for writing into the index and after some time somewhere a
buffer gets too full.

Is this possible? I would expect, that in this case elasticsearch-hadoop
should get throttled.

The only documentation about the translog is what I found here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html
which I find a bit little. I still don't know what implications the number
of translog operations has.

On the linked page it says, I could increase the numbers when doing bulk
indexing but I don't understand how this would help.
Also what's TPS?

Best regards
Julius

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d17e1231-da99-4bc2-b019-806046ffd34e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How do a connect as a second client-only Node to remote cluster?

Thx David for the clarification, it makes sense. I (wrongfully) assumed 
that the second client would connect to 9301; similarly to my logstash 
instance that also connects to port 3901, when running on the same server 
as the elasticsearch instance.

Since I don't need Node functionality anyway, I'll stick to using 
TransportClients.

grtz, Koen

On Tuesday, June 24, 2014 2:35:25 PM UTC+2, David Pilato wrote:
>
> I see. Indeed. As soon as we are able to ping a cluster on a port, I think 
> we don't try to increase port number and try all ports within the same IP 
> address.
> If you need to set a specific address/port, you should disable multicast 
> on your client Node and provide a unicast list of nodes (with the right 
> port to use).
>
> Does it make sense?
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | @elasticsearchfr 
> 
>
>
> Le 24 juin 2014 à 13:32:17, Koen Smets (koen@gmail.com ) 
> a écrit:
>
> It's not the clustername. The unit test works if I force the node to use 
> another port...
>
> Settings settings = ImmutableSettings.settingsBuilder().put("
> cluster.name", "elasticsearch")
> .put("network.publish_host", 
> "192.168.2.15").put("network.bind_host", "192.168.2.15")
> .put("transport.tcp.port", 
> 9301).put("transport.publish_port", 9301).build();
>
> But by default transport.tcp.port is range 9300-9400, but it always picks 
> both for the first and the second client always 9300.
>
> Shouldn't the port clash be discovered automatically?
>
> grtz, Koen
>
> On Tuesday, June 24, 2014 12:50:41 PM UTC+2, David Pilato wrote: 
>>
>>  But it sounds like your client is using another cluster name [1].
>>  Do you have any elasticsearch.yml in your project classpath?
>>  
>>
>>  -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
>>  @dadoonet  | @elasticsearchfr 
>> 
>>  
>>
>> Le 24 juin 2014 à 12:48:48, Koen Smets (koen@gmail.com) a écrit:
>>
>>  Hi David,
>>
>> Setting the (default) clustername, elasticsearch, doesn't help.
>>
>> grtz, Koen
>>
>> On Tuesday, June 24, 2014 12:24:51 PM UTC+2, David Pilato wrote: 
>>>
>>>  You need to set the cluster name I think.
>>>  
>>>  My 2 cents
>>>  
>>>
>>>  -- 
>>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
>>>  @dadoonet  | @elasticsearchfr 
>>> 
>>>  
>>>
>>> Le 24 juin 2014 à 10:29:11, Koen Smets (koen@gmail.com) a écrit:
>>>
>>>  Hi,
>>>
>>> I've troubles to connect a second client-only Node via the Java API to a 
>>> remote cluster.
>>>
>>> The first client (configured as client-only hence no master and no data) 
>>> runs fine on localhost:9300. However, when the second client-only node 
>>> tries to connect to the cluster an exception gets thrown:
>>>
>>> org.elasticsearch.cluster.block.ClusterBlockException: blocked by: 
>>> [SERVICE_UNAVAILABLE/1/state not recovered / 
>>> initialized];[SERVICE_UNAVAILABLE/2/no master];
>>>
>>> I use the following code to connect both clients:
>>>
>>> Node node = NodeBuilder.nodeBuilder().client(true).node();
>>> Client client = node.client();
>>>
>>> GetResponse response = client.prepareGet("twitter", "tweet", 
>>> "1").execute().actionGet();
>>> LOGGER.info(response.getSourceAsString());
>>>
>>> // on shutdown
>>> node.close();
>>>
>>> While the second client tries to join the cluster, the logs of the first 
>>> client show the following warnings:
>>>
>>> o.e.d.z.p.multicast [WARN] [Odin] received ping response 
>>> ping_response{target 
>>> [[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], master 
>>> [[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], 
>>> cluster_name[elasticsearch]} with no matching id [1]
>>>
>>> I assumed that the first client would be running on localhost:9300 and 
>>> the second one would bind to localhost:9301, but doesn't do that 
>>> automatically. Do I need to specify an additional setting to allow this 
>>> scenario?
>>>
>>> (I also tested by adding a TransportClient instead of a Node and this 
>>> works fine)
>>>
>>> JUnit test code and log output can be found at: 
>>> https://gist.github.com/ksmets/bed93778562dd2260e09
>>>
>>> Thx, Koen
>>> --
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/868dc883-dd39-44ac-ab56-1ac609f1f63f%40googlegroups.com
>>>  
>>>

Query on Id field of nested documents fails.

2014-06-24 Thread dazraf

Hi,

Very grateful for any help with the following (rather urgent) issue.
Gist: https://gist.github.com/dazraf/55ebb900b3c17583bf58


The script clears the indices and sets up dynamic mapping so that all child 
documents are treated as nested. 
Then there are two queries on fields of 
*message.statistics.timings.measure.thing*. 
The first, on *uniqueThing* field succeeds. 
The second on the *id* field fails with zero documents. 

Not sure why the second query is failing to locate any documents. 

Any help much appreciated!

thanks
Fuzz.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0397e742-d483-45aa-a4db-da300416d9da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregation Framework, possible to get distribution of requests per user

My mistake sorry,

Here is an example:

I have the request document:

"request":{
 "dynamic" : "strict",
 "properties" : {
"time" : {
  "format" : "dateOptionalTime",
  "type" : "date"
},
"user_id" : {
   "index" : "not_analyzed",
   "type" : "string"
},
"country" : {
   "index" : "not_analyzed",
   "type" : "string"
}
  }
}

I want to find the number of (unique) user_ids that have X number of 
documents, e.g. for country US, and ideally I need the full list e.g.:


1000 users have 43 documents
..
100 users have 234 documents
150 users have 500 documents
etc..

In other words the distribution of documents (requests) per unique user 
count, of course I can understand that it is a pretty heavy operation in 
terms of memory, but we may limit to the top 100 rows for instance, or if 
we can workaround it.

Thanks again for your time
Thomas

On Tuesday, 24 June 2014 13:32:13 UTC+3, Thomas wrote:
>
> Hi,
>
> I wanted to ask whether it is possible to get with the aggregation 
> framework the distribution of one specific type of documents sent per user, 
> I'm interested for occurrences of documents per user, e.g. :
>
> 1000 users sent 1 document 
> 500 ussers  sent 2 documents
> X number of unique users sent Y documents (each)
> etc.
>
> on each document i index the user_id
>
> Is there a way to support such a query, or partially support it? get the 
> first 10 rows of this type of list not the exhaustive list. Can you give me 
> some hint? 
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e07561ed-7f1b-4e98-8a8d-16e410324cc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: G1 Garbage Collector with Elasticsearch >= 1.1

2014-06-24 Thread Bruce Ritchie

We use G1GC for tomcat and mule in production but not for ES. We have found 
that G1GC is more 'stable' in terms of pause times at the cost of more 
overhead and thus less throughput. No GC algorithm will help you though if 
you have a memory leak or your vm is under extreme memory pressure.

For really large heaps I would suggest taking a look at Azul's vm. It's not 
cheap but it pretty much guarantees no pause times any heap size. I don't 
know at what overhead cost though.


Bruce

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/19f488c4-5493-4bfa-83ea-aad7ce05fe3a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES heap size slowly but surely increasing

2014-06-24 Thread Aldian

Yes I am, but... This data should be stored somewhere on the disk, right?


2014-06-24 12:03 GMT+02:00 Mark Walkom :

> Are you indexing new data?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 24 June 2014 20:01, Aldian  wrote:
>
>> Hi
>>
>> I upgraded to 1.2.1 and set memory options to 2GB. Two weeks have passed
>> now and every day I have been checking heap memory level with Bigdesk.
>>
>> At first I was not sure, but now it is clearly increasing little by
>> little. One day after I restarted ES it grew to approximately 1.3 GB of
>> used Heap Memory, and now we are at 1.6GB, which make me guess it is
>> leaking around 150 Mo Heap Memory a week.
>>
>> I am the only one experiencing such problem? And do you know what could
>> cause it?
>>
>> Best,
>>
>> Aldian
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/9481c6a9-0690-4ac5-be26-d913e1700c0e%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/eMA_PCZQLss/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAEM624afa-DiX%2B4C48wCUZhhgXA16YkzEwx-_u%2B-v2Xxq7SvfA%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Cordialement,

Aldian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAECUaLw7F-p9Rnz1Bvfr2Pwu1Jzyan0QFi8zp%2Bz-3W_Gi4m_bA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregation Framework, possible to get distribution of requests per user

I was only thinking loud. I mean that I don't know what your model looks like.
May be you could illustrate your use case with some actual data and we can move
forward from here?

What kind of documents are you actually indexing and searching for? What fields
do you have?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 24 juin 2014 à 14:42:14, Thomas (thomas.bo...@gmail.com) a écrit:

Hi David

Thank you for your reply, so based on your suggestion I should maintain a
document (e.g. user) with some aggregated values and I should update it as we
move along with our indexing of our data, correct?

This though would only give me totals. I cannot apply something like a range. I
found as well a similar discussion here
https://groups.google.com/forum/#!msg/elasticsearch/UsrCG2Abj-A/IDO9DX_PoQwJ.
Maybe something similar with the terms and histogram aggregation could support
this logic like instead of giving :

{
"aggs" : {
"requests_distribution" : {
"distribution" : {
"field" : "user_id",
"interval" : 50
}
}
}
}

and the result could be:

{
"aggregations": {
"requests_distribution" : {
"buckets": [
{
"key": 0,
"doc_count": 2
},
{
"key": 50,
"doc_count": 400
},
{
"key": 150,
"doc_count": 30
}
]
}
}
}

Where the key represents a unique number of users like for 0 to 50 users have 2
documents per user etc.

Just an idea

Thanks
Thomas

On Tuesday, 24 June 2014 13:32:13 UTC+3, Thomas wrote:
Hi,

I wanted to ask whether it is possible to get with the aggregation framework
the distribution of one specific type of documents sent per user, I'm
interested for occurrences of documents per user, e.g. :

1000 users sent 1 document
500 ussers sent 2 documents
X number of unique users sent Y documents (each)
etc.

on each document i index the user_id

Is there a way to support such a query, or partially support it? get the first
10 rows of this type of list not the exhaustive list. Can you give me some
hint?

Thanks
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ae8b56f1-a783-4ade-b948-079f6457ae27%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53a97c1d.2443a858.950f%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: ES v1.1 continuous young gc pauses old gc, stops the world when old gc happens and splits cluster

2014-06-24 Thread Bruce Ritchie

You may want to try upgrading ES - release notes for 1.2.0 indicate a 
change wrt throttling indexing when merges fall behind and earlier release 
notes post 1.1.0 have notes about a potential memory leak fix among many 
other improvements and fixes.

Best I can think of :|


Bruce

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/58147218-c37f-467a-bdd6-3d7457b8dabd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk API possible bug

2014-06-24 Thread Pablo Musa

Thanks for the answer Brian!!

Regards,
Pablo


2014-06-23 16:24 GMT-03:00 Brian :

> Hi, Pablo.
>
> I remember reading that Elasticsearch will happily store an invalid JSON
> string as your _source.
>
> From my usage of the Java API, I noticed that the Jackson library is used,
> but that only the stream parser is present. What this tells me is that ES
> is likely parsing your JSON token-by-token and has processed and indexed
> most of it. In other words, an error isn't an all-or-nothing situation.
> Since your syntax error happens at the very end of the document,
> Elasticsearch has indexed all of the document before it encounters the
> error.
>
> My guess is that if the error was not at the very end of the document,
> then Elasticsearch would fail to process and index any information past the
> error, but would successfully process and index information (if any) before
> the error.
>
> Brian
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/n-6920nqaVg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/042fcbfd-9575-4543-b6b1-2328af05b1fe%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF6PhFJEYyiah9kQWgCB1tK8bm8Me_xpa7hY21ef7T3gikXRcg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Boosting some fields on Fuzzy like this query

2014-06-24 Thread Goulven Le Breton

Hi,

I'm trying to boost some specific fields on a fuzzy like this query.

With the given document indexed :

POST fuzzy_test/product
{
  "name" : "maillot",
  "description" : "Arsenal away shirt"
}

I can retrieve the result with the following request,  :

GET fuzzy_test/product/_search
{
  "query": {
"fuzzy_like_this": {
  "fields": ["name", "description"],
  "like_text" : "maillo 1ere league"
}
  }
}

But, if I try to boost a field using the same syntax used by other queries, 
like multi match, it failed to find the document :

GET fuzzy_test/product/_search
{
  "query": {
"fuzzy_like_this": {
  "fields": ["name^2", "description"],
  "like_text" : "maillo 1ere league"
}
  }
}

Is there any way to achieve this ?
Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/72198968-adbe-47cd-9731-99b7add5cf5e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

G1 Garbage Collector with Elasticsearch >= 1.1

2014-06-24 Thread Michael Hart

I'm running into a lot of issues with large heaps of >= 8GB and full GC's, 
as are a lot of others on this forum. Everything from Oracle/Sun indicates 
that the G1 garbage collector is supposed to deal with large heaps better, 
or at least give more consistency in terms of GC pauses, than the CMS 
garbage collector. Earlier posts in this forum indicate that there were 
bugs with the G1 collector and Trove, that have now been fixed.

Is there updated information and/or recommendations from Elasticsearch 
about using the G1 collector with Java 7u55 and Elasticsearch 1.1 or 1.2?

thanks
mike

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/64d8d7fb-411f-44b0-9c51-fd6374965837%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES v1.1 continuous young gc pauses old gc, stops the world when old gc happens and splits cluster

2014-06-24 Thread Michael Hart

Removing the "-XX+UseCMSInitiatingOccupancyOnly" flag extended the time it 
took before the JVM started full GC's from about 2 hours to 7 hours in my 
cluster, but now it's back to constant full GC's. I'm out of ideas. 
Suggestions?

mike


On Monday, June 23, 2014 10:25:20 AM UTC-4, Michael Hart wrote:
>
> My nodes are in Rackspace, so they are VM's, but they are configured 
> without swap.
>
> I'm not entirely sure what the searches are up to, I'm going to 
> investigate that further.
>
> I did correlate a rapid increase in Heap used, number of segments (up from 
> the norm of ~400 to 15,000) and consequently Old GC counts when the cluster 
> attempts to merge a 5GB segment. It seems that in spite of my really fast 
> disk the merge of a 5GB segment takes up to 15 minutes. I've made two 
> changes this morning, namely set these:
>
> index.merge.scheduler.max_thread_count: 3
> index.merge.policy.max_merged_segment: 1gb
>
> The first is in the hope that while a large segment merge is underway, the 
> two other threads can still keep the small segment merges going. The second 
> is to keep the larger segment merges under control. I was ending up with 
> two 5GB segments, and a long tail of smaller ones. A quick model shows that 
> by dropping this to 1GB I'll have 12 x 1GB segments and a similar long tail 
> of smaller segments (about 50?).
>
> I've also enabled GC logging on one node, I'll leave it running for the 
> day and tomorrow remove the "-XX+UseCMSInitiatingOccupancyOnly" flag (used 
> by default for elasticsearch) and see if there's any difference. I'll 
> report back here incase this is of any use for anyone.
>
> thanks
> mike
>
> On Friday, June 20, 2014 6:31:54 PM UTC-4, Clinton Gormley wrote:
>
> * Do you have swap disabled?  (any swap plays havoc with GC)
> * You said you're doing scan/scroll - how many documents are you 
> retrieving at a time? Consider reducing the number
> * Are you running on a VM - that can cause you to swap even though your VM 
> guest thinks that swap is disabled, or steal CPU (slowing down GC)
>
> Essentially, if all of the above are false, you shouldn't be getting slow 
> GCs unless you're under very heavy memory pressure (and I see that your old 
> gen is not too bad, so that doesn't look likely).
>
>
> On 20 June 2014 16:03, Michael Hart  wrote:
>
> Thanks I do see the GC warnings in the logs, such as
>
> [2014-06-19 20:17:06,603][WARN ][monitor.jvm  ] [redacted] [gc
> ][old][179386][22718] duration [11.4s], collections [1]/[12.2s], total [
> 11.4s]/[25.2m], memory [7.1gb]->[6.9gb]/[7.2gb], all_pools {[young] [
> 158.7mb]->[7.4mb]/[266.2mb]}{[survivor] [32.4mb]->[0b]/[33.2mb]}{[old] [
> 6.9gb]->[6.9gb]/[6.9gb]}
>
> CPU Idle is around 50% when the merge starts, and drops to zero by the 
> time that first GC old warning is logged. During recovery my SSD's sustain 
> 2400 IOPS and during yesterday's outage I only see about 800 IOPS before ES 
> died. While I can throw more hardware at it, I'd prefer to do some tuning 
> first if possible. 
>
> The reason I was thinking of adding more shards is that largest segment is 
> 4.9GB (just under the default maximum set 
> by index.merge.policy.max_merged_segment). I suppose the other option is to 
> reduce the index.merge.policy.max_merged_segment setting to something 
> smaller, but I have no idea what the implications are.
>
> thoughts?
> mike
>
> On Friday, June 20, 2014 9:47:22 AM UTC-4, Ankush Jhalani wrote:
>
> Mike - The above sounds like happened due to machines sending too many 
> indexing requests and merging unable to keep up pace. Usual suspects would 
> be not enough cpu/disk speed bandwidth. 
> This doesn't sound related to memory constraints posted in the original 
> issue of this thread. Do you see memory GC traces in logs? 
>
> On Friday, June 20, 2014 9:40:48 AM UTC-4, Michael Hart wrote:
>
> We're seeing the same thing. ES 1.1.0, JDK 7u55 on Ubuntu 12.04, 5 data 
> nodes, 3 separate masters, all are 15GB hosts with 7.5GB Heaps, storage is 
> SSD. Data set is ~1.6TB according to Marvel.
>
> Our daily indices are roughly 33GB in size, with 5 shards and 2 replicas. 
> I'm still investigating what happened yesterday, but I do see in Marvel a 
> large spike in the "Indices Current Merges" graph just before the node 
> dies, and a corresponding increase in JVM Heap. When Heap hits 99% 
> everything grinds to a halt. Restarting the node "fixes" the issue, but 
> this is third or fourth time it's happened.
>
> I'm still researching how to deal with this, but a couple of things I am 
> looking at are:
>
>- increase the number of shards so that the segment merges stay 
>smaller (is that even a legitimate sentence?) I'm still reading through 
>this page the Index Module Merge page 
>
> 
>  for 
>more details. 
>- look at store level throttling 
>
>

scripted partial delete

2014-06-24 Thread eunever32

Hi

I want to delete just one element from a list
Is that possible inside a script?

>From the docs it only appears possible to delete the document if it 
contains an element.

The solution then I guess is to read the entire document. And delete in 
memory the relevant piece. 

Then write back the remainder?

But I was hoping for a "bulk" based approach

Thanks,

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f92b622-beef-416e-904b-3fbe53a99839%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregation Framework, possible to get distribution of requests per user

Hi David 

Thank you for your reply, so based on your suggestion I should maintain a 
document (e.g. user) with some aggregated values and I should update it as 
we move along with our indexing of our data, correct?

This though would only give me totals. I cannot apply something like a 
range. I found as well a similar discussion here 
https://groups.google.com/forum/#!msg/elasticsearch/UsrCG2Abj-A/IDO9DX_PoQwJ. 
Maybe something similar with the terms and histogram aggregation could 
support this logic like instead of giving :

{
"aggs" : {
"requests_distribution" : {
"distribution" : {
"field" : "user_id",
"interval" : 50
}
}
}
}

and the result could be:

{
"aggregations": {
"requests_distribution" : {
"buckets": [
{
"key": 0,
"doc_count": 2
},
{
"key": 50,
"doc_count": 400
},
{
"key": 150,
"doc_count": 30
}
]
}
}
}

Where the key represents a unique number of users like for 0 to 50 users 
have 2 documents per user etc.

Just an idea

Thanks
Thomas

On Tuesday, 24 June 2014 13:32:13 UTC+3, Thomas wrote:
>
> Hi,
>
> I wanted to ask whether it is possible to get with the aggregation 
> framework the distribution of one specific type of documents sent per user, 
> I'm interested for occurrences of documents per user, e.g. :
>
> 1000 users sent 1 document 
> 500 ussers  sent 2 documents
> X number of unique users sent Y documents (each)
> etc.
>
> on each document i index the user_id
>
> Is there a way to support such a query, or partially support it? get the 
> first 10 rows of this type of list not the exhaustive list. Can you give me 
> some hint? 
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ae8b56f1-a783-4ade-b948-079f6457ae27%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How do a connect as a second client-only Node to remote cluster?

I see. Indeed. As soon as we are able to ping a cluster on a port, I think we 
don't try to increase port number and try all ports within the same IP address.
If you need to set a specific address/port, you should disable multicast on 
your client Node and provide a unicast list of nodes (with the right port to 
use).

Does it make sense?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 24 juin 2014 à 13:32:17, Koen Smets (koen.sm...@gmail.com) a écrit:

It's not the clustername. The unit test works if I force the node to use 
another port...

            Settings settings = 
ImmutableSettings.settingsBuilder().put("cluster.name", "elasticsearch")
                    .put("network.publish_host", 
"192.168.2.15").put("network.bind_host", "192.168.2.15")
                    .put("transport.tcp.port", 
9301).put("transport.publish_port", 9301).build();

But by default transport.tcp.port is range 9300-9400, but it always picks both 
for the first and the second client always 9300.

Shouldn't the port clash be discovered automatically?

grtz, Koen

On Tuesday, June 24, 2014 12:50:41 PM UTC+2, David Pilato wrote:
But it sounds like your client is using another cluster name [1].
Do you have any elasticsearch.yml in your project classpath?


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 24 juin 2014 à 12:48:48, Koen Smets (koen@gmail.com) a écrit:

Hi David,

Setting the (default) clustername, elasticsearch, doesn't help.

grtz, Koen

On Tuesday, June 24, 2014 12:24:51 PM UTC+2, David Pilato wrote:
You need to set the cluster name I think.

My 2 cents


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 24 juin 2014 à 10:29:11, Koen Smets (koen@gmail.com) a écrit:

Hi,

I've troubles to connect a second client-only Node via the Java API to a remote 
cluster.

The first client (configured as client-only hence no master and no data) runs 
fine on localhost:9300. However, when the second client-only node tries to 
connect to the cluster an exception gets thrown:

org.elasticsearch.cluster.block.ClusterBlockException: blocked by: 
[SERVICE_UNAVAILABLE/1/state not recovered / 
initialized];[SERVICE_UNAVAILABLE/2/no master];

I use the following code to connect both clients:

    Node node = NodeBuilder.nodeBuilder().client(true).node();
            Client client = node.client();

            GetResponse response = client.prepareGet("twitter", "tweet", 
"1").execute().actionGet();
            LOGGER.info(response.getSourceAsString());

            // on shutdown
            node.close();

While the second client tries to join the cluster, the logs of the first client 
show the following warnings:

o.e.d.z.p.multicast [WARN] [Odin] received ping response ping_response{target 
[[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], master 
[[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], 
cluster_name[elasticsearch]} with no matching id [1]

I assumed that the first client would be running on localhost:9300 and the 
second one would bind to localhost:9301, but doesn't do that automatically. Do 
I need to specify an additional setting to allow this scenario?

(I also tested by adding a TransportClient instead of a Node and this works 
fine)

JUnit test code and log output can be found at: 
https://gist.github.com/ksmets/bed93778562dd2260e09

Thx, Koen
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/868dc883-dd39-44ac-ab56-1ac609f1f63f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/43ed69ab-83e8-4128-8450-bcb94fc93852%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ac506781-daf9-4255-ac4c-09ac1b833d4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view

Re: No terms generated for trigram analyzer

Hello,

You are performing a search by uri, by default it searches in the _all
field. In your case this field doesn't use at all your trigrams
analyzer.

You could either pass an explicit query : {"query": {...} }, or
specify which field you want to match: curl -XGET
'http://localhost:9200/urls/_search?q=jen&analyzer=trigrams&pretty=true&df=title'

I think it works for "jen*" because it's converted into a wildcard query.

For the termvectors, you have to enable them in your mapping:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#string


Cédric Hourcade
c...@wal.fr


On Tue, Jun 24, 2014 at 12:47 PM, Andreas Falk  wrote:
> Hey,
>
> I'm trying to get a trigram analyzer working but i'm fairly sure i'm doing
> something wrong because as i understand it it doesn't generate any terms at
> all for my document. I've done a complete log with curl commands of what i'm
> doing here: https://gist.github.com/luuse/cb707b85c73f8e82cd8d
>
> 1. So i start with creating the index and at the same time i add the
> analyzer and a mapping for all fields in my document. The response when i
> create it is in create.json and the body i send is in mapping.json.
> 2. I index the document in url.json and get the response in index.json
> 3. I get the termvector in termvector.json
> 4. I query it with "jen" and the analyzer trigrams figuring it should match
> against "jenkins" but no results
> 5. I query it with "jen*" and still the analyzer trigrams and get the
> jenkins result
>
> So I have two questions...
>
> a. When i fetch the termvector it looks like it empty. Is this correct?
> b. Have i missed some detail or what am i doing wrong? Why isn't it working?
>
> I can provide more details if you want. I'm running v1.2.1 in a docker
> container.
>
> Cheers
> Andreas
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e4fef398-0941-4471-8efa-a97878fcb210%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPM_j2EzFnLrLwXkdFV8GJ8WonedystnwiOw%3DDmCQasACQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk inserting is slow

Thanks for all this.

I changed my conf, removed all the thread pool config, reduced refresh time 
to 5s according to Michael advice, and limited my batch to 10 000.
I'll see how it works then i'll paralellize the bulk insert.
I'll tell you how it ends up.

Thanks again !

Le lundi 23 juin 2014 12:56:14 UTC+2, Jörg Prante a écrit :
>
> Your bulk insert size is too large. It makes no sense to insert 100.000 
> with one request. Use 1000-1 instead.
>
> Also you should submit bulk requests in parallel and not sequential like 
> you do. Sequential bulk is slow if client CPU/network is not saturated.
>
> Check if you have disabled the index refresh from 1 (1s) to -1 while bulk 
> indexing is active. 30s makes not much sense if you can execute the bulk in 
> this time.
>
> Do not limit indexing memory to 50%.
>
> It makes no sense to increase queue_size for bulk thread pool to 1000. 
> This means you want a single ES node should accept 1000 x 10 = 100 000 
> 000 = 100m docs at once. This will simply exceeds all reasonable limits and 
> bring the node down with an OOM (if you really have 100m docs).
>
> More advice is possible if you can show your client code how you push docs 
> to ES.
>
> Jörg
>
>
>
> On Mon, Jun 23, 2014 at 12:30 PM, Frederic Esnault  > wrote:
>
>> Hi everyone,
>>
>> I'm inserting around 265 000 documents into an elastic search cluster 
>> composed of 3 nodes (real servers).
>> On two servers i give elastic search 20g of heap, on third one which has 
>> 64g ram, i set 30g of heap for elastic search.
>>
>> I set elastic search configuration to :
>>
>> - 3 shards (1 per server)
>> - 0 replicas
>> - discovery.zen.ping.multicast.enabled: false (and giving on each node 
>> the unicast hostnames of the two other nodes);
>> - and this :
>>
>> indices.memory.index_buffer_size: 50%
>> index.refresh_interval: 30s
>> threadpool:
>>   index:
>> type: fixed
>> size: 30
>> queue_size: 1000
>>   bulk:
>> queue_size: 1000
>>   bulk:
>> type: fixed
>> size: 30
>> queue_size: 1000
>>   search:
>> type: fixed
>> size: 100
>> queue_size: 200
>>   get:
>> type: fixed
>> size: 100
>> queue_size: 200
>>
>> Indexing is done by groups of 100 000 docs, and here is my application 
>> log :
>> INFO: Adding records to bulk insert batch
>> INFO: Added 10 records to bulk insert batch. Inserting batch...
>> -- Bulk insert took 38.724 secondes
>> INFO: Adding records to bulk insert batch
>> INFO: Added 10 records to bulk insert batch. Inserting batch...
>> -- Bulk insert took 31.134 secondes
>> INFO: Adding records to bulk insert batch
>> INFO: Added 64201 records to bulk insert batch. Inserting batch...
>> -- Bulk insert took 17.366 secondes
>>
>> --- Import CSV file took 108.905 secondes ---
>>
>> I'm wondering if this time is correct or not, or if there is something i 
>> can do to improve performances ?
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/3a38e79e-9afb-4146-a7e1-7984ec082e22%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/26c24554-534a-4b41-997a-31da200efdd9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Issue concerning mapping and source

Hi Ivan, thanks for all this.
I was getting results but without my fields. Knowing what you told me about 
default behaviour of queries to only return source, this all makes sense 
now.
Actually i was thinking of disabling source for indexing performance boost. 
But if i loose performance afterwards, this seems not a good idea.

Thanks a lot !

Le lundi 23 juin 2014 16:05:06 UTC+2, Ivan Brusic a écrit :
>
> What exactly is the issue? Are you getting back results, just with no 
> data? By default, a query will only return the _source field. If you want 
> to return other stored fields, then you would need to explicit name them:
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html#search-request-fields
>
> Also, disabling source does not necessarily increase performance. Lucene 
> would need to execute a seek for each individual field instead of just one 
> for the source. If you are request numerous fields, then using stored 
> fields actually decreases performance. It all depends on the size of your 
> document/fields and the number of fields used.
>
> Cheers,
>
> Ivan
>
>
> On Sun, Jun 22, 2014 at 3:00 AM, Frederic Esnault  > wrote:
>
>> Hi,
>>
>> I'm trying to index documents in elastic search. I'm using elastic search 
>> 1.2.1, from the Java API.
>> My cluster is remote, 3 nodes on 3 servers (one node on each server), 
>> optimised for indexing (one shard per node, no replication).
>> For this, i read a CSV file, from which i generate mapping file.
>> For performance reasons, i try to disable _source, which works, the 
>> mapping i can read after index creation is correct.
>> The thing is after inserting data, i hava nothing in my docs except the 
>> id generated by ES. If i allow _source field, i only have my data in the 
>> _source field.
>>
>> Here is how i generate the mapping :
>>
>> *XContentBuilder mapping = jsonBuilder()*
>> *.startObject()*
>> *.startObject("record")*
>> *//.startObject("_source").field("enabled", 
>> false).endObject()*
>> *.startObject("properties");*
>> *for (ColumnMetadata column : 
>> dataset.getMetadata().getColumns()) {*
>> *
>> mapping.startObject(column.geName()).field("type", 
>> ESColumnTypeHelper.getESType(column.getType())).field("store", 
>> "yes").field("index", "analyzed").endObject();*
>> *}*
>> *mapping.endObject()*
>> *.endObject()*
>> *.endObject();*
>>
>> Then i create the index :
>>
>> Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", 
>> storeClusterName).build();
>>
>> *   CreateIndexRequestBuilder createIndexRequestBuilder = 
>> client.admin().indices().prepareCreate(datasetName).addMapping("record", *
>> *mapping**);*
>> *   CreateIndexRequest request = createIndexRequestBuilder.request();*
>> *   try {*
>> *   CreateIndexResponse createResponse = 
>> client.admin().indices().create(request).actionGet();*
>> *   if (!createResponse.isAcknowledged()) {*
>> *   logger.log(Level.SEVERE, "Index creation not 
>> acknowledged.");*
>> *   } else {*
>> *   logger.log(Level.INFO, "Index creation acknowledged.");*
>> *   }*
>> *   } catch (IndexAlreadyExistsException iae) {*
>> *   logger.log(Level.SEVERE, "Index already exists...");*
>> *   }*
>>
>> And now how i index using the Bulk API :
>>
>> *BulkRequestBuilder bulkRequest = client.prepareBulk();*
>> *try {*
>> *logger.log(Level.INFO, "Creating records");*
>> *for (Record record : records) {*
>> *IndexRequestBuilder builder = 
>> client.prepareIndex(datasetName, "record");*
>> *XContentBuilder data = jsonBuilder();*
>> *data.startObject();*
>> *for (ColumnMetadata column : 
>> dataset.getMetadata().getColumns()) {*
>> *Object value = 
>> record.getCell(column.getName()).getValue();*
>> *if (value == null || (value instanceof String && 
>> value.equals("NULL"))) {*
>> *value = null;*
>> *}*
>> *data.field(column.getNormalizedName(), value);*
>> *}*
>> *data.endObject();*
>> *builder.setSource(data);*
>> *bulkRequest.add(builder);*
>> *logger.log(Level.INFO, "Creating records");*
>> *}*
>> *logger.log(Level.INFO, "Created "+ 
>> bulkRequest.numberOfActions() +" records");*
>>
>> *BulkResponse bulkResponse = 
>> bulkRequest.execute().actionGet();*
>> *if (bulkResponse.hasFailures()) {*
>> *logger.log(Level.SEVERE, "Could not index : " + 
>> bulkResponse.buildFailureMessag

Re: How do a connect as a second client-only Node to remote cluster?

It's not the clustername. The unit test works if I force the node to use 
another port...

Settings settings = 
ImmutableSettings.settingsBuilder().put("cluster.name", "elasticsearch")
.put("network.publish_host", 
"192.168.2.15").put("network.bind_host", "192.168.2.15")
.put("transport.tcp.port", 
9301).put("transport.publish_port", 9301).build();

But by default transport.tcp.port is range 9300-9400, but it always picks 
both for the first and the second client always 9300.

Shouldn't the port clash be discovered automatically?

grtz, Koen

On Tuesday, June 24, 2014 12:50:41 PM UTC+2, David Pilato wrote:
>
> But it sounds like your client is using another cluster name [1].
> Do you have any elasticsearch.yml in your project classpath?
>
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | @elasticsearchfr 
> 
>
>
> Le 24 juin 2014 à 12:48:48, Koen Smets (koen@gmail.com ) 
> a écrit:
>
> Hi David,
>
> Setting the (default) clustername, elasticsearch, doesn't help.
>
> grtz, Koen
>
> On Tuesday, June 24, 2014 12:24:51 PM UTC+2, David Pilato wrote: 
>>
>>  You need to set the cluster name I think.
>>  
>>  My 2 cents
>>  
>>
>>  -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
>>  @dadoonet  | @elasticsearchfr 
>> 
>>  
>>
>> Le 24 juin 2014 à 10:29:11, Koen Smets (koen@gmail.com) a écrit:
>>
>>  Hi,
>>
>> I've troubles to connect a second client-only Node via the Java API to a 
>> remote cluster.
>>
>> The first client (configured as client-only hence no master and no data) 
>> runs fine on localhost:9300. However, when the second client-only node 
>> tries to connect to the cluster an exception gets thrown:
>>
>> org.elasticsearch.cluster.block.ClusterBlockException: blocked by: 
>> [SERVICE_UNAVAILABLE/1/state not recovered / 
>> initialized];[SERVICE_UNAVAILABLE/2/no master];
>>
>> I use the following code to connect both clients:
>>
>> Node node = NodeBuilder.nodeBuilder().client(true).node();
>> Client client = node.client();
>>
>> GetResponse response = client.prepareGet("twitter", "tweet", 
>> "1").execute().actionGet();
>> LOGGER.info(response.getSourceAsString());
>>
>> // on shutdown
>> node.close();
>>
>> While the second client tries to join the cluster, the logs of the first 
>> client show the following warnings:
>>
>> o.e.d.z.p.multicast [WARN] [Odin] received ping response 
>> ping_response{target 
>> [[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], master 
>> [[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], 
>> cluster_name[elasticsearch]} with no matching id [1]
>>
>> I assumed that the first client would be running on localhost:9300 and 
>> the second one would bind to localhost:9301, but doesn't do that 
>> automatically. Do I need to specify an additional setting to allow this 
>> scenario?
>>
>> (I also tested by adding a TransportClient instead of a Node and this 
>> works fine)
>>
>> JUnit test code and log output can be found at: 
>> https://gist.github.com/ksmets/bed93778562dd2260e09
>>
>> Thx, Koen
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/868dc883-dd39-44ac-ab56-1ac609f1f63f%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>  
>>  --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/43ed69ab-83e8-4128-8450-bcb94fc93852%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ac506781-daf9-4255-ac4c-09ac1b833d4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Penalty or boost from a boolean property

2014-06-24 Thread Hugo Lassiège

Thanks for your help. It works :)


2014-06-20 19:54 GMT+02:00 David Pilato :

> Function_score is the way to go IMHO.
>
> Best
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 20 juin 2014 à 19:50, hugo lassiege  a écrit :
>
> Hi,
>
> I'm looking for help :) This is maybe trivial but I can't find the good
> solution.
>
> I have some documents and those documents have two boolean properties,
> basically thumbs up and thumbs down to show that the administrator approve
> or not those documents.
> I try to boost a document if it is "thumbsup" or demote the document if it
> is thumbsdown. It's not a filter, the document could be retrieved, it's
> just more or less relevant.
>
> I tried with two should clauses in the global request :
>
>
> {
> "bool" : {
> "should" : [
> {
> "term" : { "champ1" : "valeur1" }
> },
> {
> "term" : { "champ2" : "valeur2" }
> },
> {
> "term" : { "thumbsup" : true }
> },
> {
> "term" : { "thumbsdown" : false}
> }
> ]
> }
> }
>
>
> But I get some irrelevant documents because they match the last
> conditions.
> What would be the best method for this use case ?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
>
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ba3964f0-fbc8-4e0c-be3f-c38af8221410%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/26Y62Eisrm4/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/088863F1-E2EA-45A6-9368-D9AA69E717FE%40pilato.fr
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKXL5Nh7wPrYMpkE7NmopbOULT34w_etBgD076YTUBnpoYxbhQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSerach Cluster behind loadbalancer

hi himanshu,

Well currently i've used /_cluster/health for this. And since elb suppose to
execute as per IP. And once i start getting response. I means it is up. But
it is not identifying reallocating shard process. 

I like your idea. that plugin can be programmed in such a way. that it
should start sending HTTP 200 only when node has done sharding etc.

Thank you.



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/ElasticSerach-Cluster-behind-loadbalancer-tp4058369p4058384.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1403607773085-4058384.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: Update existing values within a list or array in Elastic Search

Just send the full new version of your document and you're done.
Elasticsearch will index the new content.

If I did not answer to you question, may be you should provide an example as a
Gist so we can comment with more details???

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 24 juin 2014 à 12:51:54, Madhumita Sadhukhan (madhumita.sadhuk...@gmail.com)
a écrit:

I am not removing any element from an array I need to update particular
existing elements within an array.
Currently I ma using the Update API in Elastic Search to append new elements
within the array but it fails when I try to update values of existing elements.

For eg:

I am able to add a new record as follows to jobs list using Update API scripts

{
"status": "InProgress",
"runId": 2,
"start_date": 2101112,
"orderId": "undefined"
},

What I need is how to update orderId value from 'undefined' to say 'abcd' using
update api.
Is this supported?
Is Elastic Search supporting indexing in arrays or lists?

On Tuesday, June 24, 2014 12:57:15 PM UTC+5:30, David Pilato wrote:
If you send the full document without the element you need to remove in the
array, this should work fine.
How do you actually update your document?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 24 juin 2014 à 09:18:20, Madhumita Sadhukhan (madhumita...@gmail.com) a
écrit:

I have a requirement where I need to update(not append) existing values within
a list or array in Elastic Search.
Is this feature supported in elastic search?

For eg:

I have a field called jobs as part of my document

"jobs": [
{
"status": "InProgress",
"runId": 1,
"start_date": 2101112,
"orderId": "undefined"
},
{
"status": "InProgress",
"runId": 2,
"start_date": 2101112,
"orderId": "undefined"
},
],
and I am required to update the orderId for each job run to different values.
Currently I am only able to append a job but I cannot update the attributes of
each job later.
Is this usecase supported and possible in Elastic Search?
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ba7f8f47-3578-4b47-9410-9dd2622911a2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e69ec6b-4c53-4802-ab0a-49aa4f395c97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53a95aab.3352255a.950f%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSerach Cluster behind loadbalancer

2014-06-24 Thread Himanshu Agrawal

Hi Tarun,

You can use ELB ping functionality along with a custom elasticsearch plugin
to achieve this functionality. You can configure your ping path to be
something like /_plugin/ and write your plugin in such a
way that it returns HTTP 200 only when you want your node to serve the
traffic and HTTP 503/500 when you don't want the node to serve traffic.

See
http://docs.aws.amazon.com/gettingstarted/latest/computebasics-linux/getting-started-create-lb.html
for more details on ELB pings and
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/ts-elb-healthcheck.html
for troubleshooting.

Thanks,
Himanshu.

On Tue, Jun 24, 2014 at 4:14 PM, Tarun Jangra  wrote:

> In that case, i don't think i should bother if node is up and reallocation
> is
> done.
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/ElasticSerach-Cluster-behind-loadbalancer-tp4058369p4058375.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1403606669657-4058375.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACTSS%3D57q3W35%2BeUZZrpLar5D1hvs7F5%2BrTa%2BXWH6B-j8MiLxw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Update existing values within a list or array in Elastic Search

2014-06-24 Thread Madhumita Sadhukhan


I am not removing any element from an array I need to update particular 
existing elements within an array.
Currently I ma using the Update API in Elastic Search to append new 
elements within the array but it fails when I try to update values of 
existing elements.

For eg:

I am able to add a new record as follows to jobs list using Update API 
scripts

{
 "status": "InProgress",
 "runId": 2,
 "start_date": 2101112,
 "orderId": "undefined"
  },


What I need is how to update orderId value from 'undefined' to say 'abcd' 
using update api.
Is this supported?
Is Elastic Search supporting indexing in arrays or lists?

On Tuesday, June 24, 2014 12:57:15 PM UTC+5:30, David Pilato wrote:
>
> If you send the full document without the element you need to remove in 
> the array, this should work fine.
> How do you actually update your document?
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | @elasticsearchfr 
> 
>
>
> Le 24 juin 2014 à 09:18:20, Madhumita Sadhukhan (madhumita...@gmail.com 
> ) a écrit:
>
> I have a requirement where I need to update(not append) existing values 
> within a list or array in Elastic Search.
> Is this feature supported in elastic search?
>
> For eg:
>
> I have a field called jobs as part of my document
>
> "jobs": [
>   {
>  "status": "InProgress",
>  "runId": 1,
>  "start_date": 2101112,
>  "orderId": "undefined"
>   },
>   {
>  "status": "InProgress",
>  "runId": 2,
>  "start_date": 2101112,
>  "orderId": "undefined"
>   },
>],
> and I am required to update the orderId for each job run to different 
> values.
> Currently I am only able to append a job but I cannot update the 
> attributes of each job later.
> Is this usecase supported and possible in Elastic Search?
> --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/ba7f8f47-3578-4b47-9410-9dd2622911a2%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2e69ec6b-4c53-4802-ab0a-49aa4f395c97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How do a connect as a second client-only Node to remote cluster?

But it sounds like your client is using another cluster name [1].
Do you have any elasticsearch.yml in your project classpath?


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 24 juin 2014 à 12:48:48, Koen Smets (koen.sm...@gmail.com) a écrit:

Hi David,

Setting the (default) clustername, elasticsearch, doesn't help.

grtz, Koen

On Tuesday, June 24, 2014 12:24:51 PM UTC+2, David Pilato wrote:
You need to set the cluster name I think.

My 2 cents


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 24 juin 2014 à 10:29:11, Koen Smets (koen@gmail.com) a écrit:

Hi,

I've troubles to connect a second client-only Node via the Java API to a remote 
cluster.

The first client (configured as client-only hence no master and no data) runs 
fine on localhost:9300. However, when the second client-only node tries to 
connect to the cluster an exception gets thrown:

org.elasticsearch.cluster.block.ClusterBlockException: blocked by: 
[SERVICE_UNAVAILABLE/1/state not recovered / 
initialized];[SERVICE_UNAVAILABLE/2/no master];

I use the following code to connect both clients:

    Node node = NodeBuilder.nodeBuilder().client(true).node();
            Client client = node.client();

            GetResponse response = client.prepareGet("twitter", "tweet", 
"1").execute().actionGet();
            LOGGER.info(response.getSourceAsString());

            // on shutdown
            node.close();

While the second client tries to join the cluster, the logs of the first client 
show the following warnings:

o.e.d.z.p.multicast [WARN] [Odin] received ping response ping_response{target 
[[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], master 
[[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], 
cluster_name[elasticsearch]} with no matching id [1]

I assumed that the first client would be running on localhost:9300 and the 
second one would bind to localhost:9301, but doesn't do that automatically. Do 
I need to specify an additional setting to allow this scenario?

(I also tested by adding a TransportClient instead of a Node and this works 
fine)

JUnit test code and log output can be found at: 
https://gist.github.com/ksmets/bed93778562dd2260e09

Thx, Koen
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/868dc883-dd39-44ac-ab56-1ac609f1f63f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/43ed69ab-83e8-4128-8450-bcb94fc93852%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53a957f6.216231b.950f%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: How do a connect as a second client-only Node to remote cluster?

Hi David,

Setting the (default) clustername, elasticsearch, doesn't help.

grtz, Koen

On Tuesday, June 24, 2014 12:24:51 PM UTC+2, David Pilato wrote:
>
> You need to set the cluster name I think.
>
> My 2 cents
>
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | @elasticsearchfr 
> 
>
>
> Le 24 juin 2014 à 10:29:11, Koen Smets (koen@gmail.com ) 
> a écrit:
>
> Hi,
>
> I've troubles to connect a second client-only Node via the Java API to a 
> remote cluster.
>
> The first client (configured as client-only hence no master and no data) 
> runs fine on localhost:9300. However, when the second client-only node 
> tries to connect to the cluster an exception gets thrown:
>
> org.elasticsearch.cluster.block.ClusterBlockException: blocked by: 
> [SERVICE_UNAVAILABLE/1/state not recovered / 
> initialized];[SERVICE_UNAVAILABLE/2/no master];
>
> I use the following code to connect both clients:
>
> Node node = NodeBuilder.nodeBuilder().client(true).node();
> Client client = node.client();
>
> GetResponse response = client.prepareGet("twitter", "tweet", 
> "1").execute().actionGet();
> LOGGER.info(response.getSourceAsString());
>
> // on shutdown
> node.close();
>
> While the second client tries to join the cluster, the logs of the first 
> client show the following warnings:
>
> o.e.d.z.p.multicast [WARN] [Odin] received ping response 
> ping_response{target 
> [[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], master 
> [[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], 
> cluster_name[elasticsearch]} with no matching id [1]
>
> I assumed that the first client would be running on localhost:9300 and the 
> second one would bind to localhost:9301, but doesn't do that automatically. 
> Do I need to specify an additional setting to allow this scenario?
>
> (I also tested by adding a TransportClient instead of a Node and this 
> works fine)
>
> JUnit test code and log output can be found at: 
> https://gist.github.com/ksmets/bed93778562dd2260e09
>
> Thx, Koen
> --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/868dc883-dd39-44ac-ab56-1ac609f1f63f%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/43ed69ab-83e8-4128-8450-bcb94fc93852%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No terms generated for trigram analyzer

2014-06-24 Thread Andreas Falk

Hey,

I'm trying to get a trigram analyzer working but i'm fairly sure i'm doing
something wrong because as i understand it it doesn't generate any terms at
all for my document. I've done a complete log with curl commands of what
i'm doing here: https://gist.github.com/luuse/cb707b85c73f8e82cd8d

1. So i start with creating the index and at the same time i add the
analyzer and a mapping for all fields in my document. The response when i
create it is in create.json and the body i send is in mapping.json.
2. I index the document in url.json and get the response in index.json
3. I get the termvector in termvector.json
4. I query it with "jen" and the analyzer trigrams figuring it should match
against "jenkins" but no results
5. I query it with "jen*" and still the analyzer trigrams and get the
jenkins result

So I have two questions...

a. When i fetch the termvector it looks like it empty. Is this correct?
b. Have i missed some detail or what am i doing wrong? Why isn't it working?

I can provide more details if you want. I'm running v1.2.1 in a docker
container.

Cheers
Andreas

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e4fef398-0941-4471-8efa-a97878fcb210%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregation Framework, possible to get distribution of requests per user

Imagine that you have indexed users.
User has a numberOfDocs field.

You can build a range aggregation on top of that and gives back the count for
buckets like:

numberOfDocs < 2
1 < numberOfDocs < 3
…

See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-range-aggregation.html#search-aggregations-bucket-range-aggregation

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 24 juin 2014 à 12:32:16, Thomas (thomas.bo...@gmail.com) a écrit:

Hi,

1000 users sent 1 document
500 ussers sent 2 documents
X number of unique users sent Y documents (each)
etc.

on each document i index the user_id

Is there a way to support such a query, or partially support it? get the first
10 rows of this type of list not the exhaustive list. Can you give me some
hint?

Thanks
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c9e7e543-372c-4441-9cac-e7c0f259ed4e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53a956a4.5bd062c2.950f%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSerach Cluster behind loadbalancer

In that case, i don't think i should bother if node is up and reallocation is
done.



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/ElasticSerach-Cluster-behind-loadbalancer-tp4058369p4058375.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1403606669657-4058375.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSerach Cluster behind loadbalancer

Yes they can both occur in parallel, performance may drop a little but it
will still respond to queries and indexing.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 24 June 2014 20:39, Tarun Jangra  wrote:

> Thanks mark,
>
> how if node starts getting queries during this reallocating period. Is
> there
> anything i am suppose to loose? Like write can also occur during
> reallocations.
>
>
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/ElasticSerach-Cluster-behind-loadbalancer-tp4058369p4058372.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1403606364842-4058372.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624aHie7k8fWFEqeMNtAH8i3SaHF5sfPEE3WE1Z5koNR2Pg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSerach Cluster behind loadbalancer

Thanks mark,

how if node starts getting queries during this reallocating period. Is there
anything i am suppose to loose? Like write can also occur during
reallocations.





--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/ElasticSerach-Cluster-behind-loadbalancer-tp4058369p4058372.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1403606364842-4058372.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Disabling _all-Field but keep Netflow-Events searchable

2014-06-24 Thread horst knete

Hey guys,

I really want to disable the _all-Field in the ES-Indices to save some 
disk-space on our system.

Normally its not the problem - adjust template in ES, and set 
"message"-Field to the new default query field, that is normally available 
in any event.

The problem is that we also have many netflow-events with the netflow-codec 
that have the following form:



As you might notice there isnt any "message"-field so the Kibana lucene 
query would run into an error.

My question is - how do i manage it to make this work (disabling _all-Field 
but search in the netflow-events)?

Thanks for response.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9ab09bba-392f-4f77-8937-aa518c22292f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSerach Cluster behind loadbalancer

This is something you would need to co-ordinate outside of ES, there is
nothing native that could do it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 24 June 2014 20:29, Tarun Jangra  wrote:

> hi guys,
>
> Scenario1.
>
> 1. I have 3 es nodes.
> 2. I have not installed nginx,tomcat on elasticsearch nodes.
> 3. These are behind amazon elb.
> 4. cluster is detecting nodes automatically.
>
> My need.
> 1. elb start putting queries on node when it is actually ready to take
> traffic. Usually shards reallocation takes time depending on the size of
> the
> shard.
>
>
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/ElasticSerach-Cluster-behind-loadbalancer-tp4058369.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1403605780346-4058369.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZGrss3kyubo2qQzrYk1NOZ4C_1u8d68S1TGxavZUVW4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Aggregation Framework, possible to get distribution of requests per user

Hi,

I wanted to ask whether it is possible to get with the aggregation 
framework the distribution of one specific type of documents sent per user, 
I'm interested for occurrences of documents per user, e.g. :

1000 users sent 1 document 
500 ussers  sent 2 documents
X number of unique users sent Y documents (each)
etc.

on each document i index the user_id

Is there a way to support such a query, or partially support it? get the 
first 10 rows of this type of list not the exhaustive list. Can you give me 
some hint? 

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c9e7e543-372c-4441-9cac-e7c0f259ed4e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ElasticSerach Cluster behind loadbalancer

hi guys,

Scenario1.

1. I have 3 es nodes.
2. I have not installed nginx,tomcat on elasticsearch nodes.
3. These are behind amazon elb.
4. cluster is detecting nodes automatically.

My need.
1. elb start putting queries on node when it is actually ready to take
traffic. Usually shards reallocation takes time depending on the size of the
shard.





--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/ElasticSerach-Cluster-behind-loadbalancer-tp4058369.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1403605780346-4058369.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: How do a connect as a second client-only Node to remote cluster?

You need to set the cluster name I think.

My 2 cents


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 24 juin 2014 à 10:29:11, Koen Smets (koen.sm...@gmail.com) a écrit:

Hi,

I've troubles to connect a second client-only Node via the Java API to a remote 
cluster.

The first client (configured as client-only hence no master and no data) runs 
fine on localhost:9300. However, when the second client-only node tries to 
connect to the cluster an exception gets thrown:

org.elasticsearch.cluster.block.ClusterBlockException: blocked by: 
[SERVICE_UNAVAILABLE/1/state not recovered / 
initialized];[SERVICE_UNAVAILABLE/2/no master];

I use the following code to connect both clients:

    Node node = NodeBuilder.nodeBuilder().client(true).node();
            Client client = node.client();

            GetResponse response = client.prepareGet("twitter", "tweet", 
"1").execute().actionGet();
            LOGGER.info(response.getSourceAsString());

            // on shutdown
            node.close();

While the second client tries to join the cluster, the logs of the first client 
show the following warnings:

o.e.d.z.p.multicast [WARN] [Odin] received ping response ping_response{target 
[[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], master 
[[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], 
cluster_name[elasticsearch]} with no matching id [1]

I assumed that the first client would be running on localhost:9300 and the 
second one would bind to localhost:9301, but doesn't do that automatically. Do 
I need to specify an additional setting to allow this scenario?

(I also tested by adding a TransportClient instead of a Node and this works 
fine)

JUnit test code and log output can be found at: 
https://gist.github.com/ksmets/bed93778562dd2260e09

Thx, Koen
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/868dc883-dd39-44ac-ab56-1ac609f1f63f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53a951e9.507ed7ab.950f%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: ES heap size slowly but surely increasing

Are you indexing new data?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 24 June 2014 20:01, Aldian  wrote:

> Hi
>
> I upgraded to 1.2.1 and set memory options to 2GB. Two weeks have passed
> now and every day I have been checking heap memory level with Bigdesk.
>
> At first I was not sure, but now it is clearly increasing little by
> little. One day after I restarted ES it grew to approximately 1.3 GB of
> used Heap Memory, and now we are at 1.6GB, which make me guess it is
> leaking around 150 Mo Heap Memory a week.
>
> I am the only one experiencing such problem? And do you know what could
> cause it?
>
> Best,
>
> Aldian
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/9481c6a9-0690-4ac5-be26-d913e1700c0e%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624afa-DiX%2B4C48wCUZhhgXA16YkzEwx-_u%2B-v2Xxq7SvfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

ES heap size slowly but surely increasing

2014-06-24 Thread Aldian

Hi

I upgraded to 1.2.1 and set memory options to 2GB. Two weeks have passed 
now and every day I have been checking heap memory level with Bigdesk. 

At first I was not sure, but now it is clearly increasing little by little. 
One day after I restarted ES it grew to approximately 1.3 GB of used Heap 
Memory, and now we are at 1.6GB, which make me guess it is leaking around 
150 Mo Heap Memory a week.

I am the only one experiencing such problem? And do you know what could 
cause it?

Best,

Aldian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9481c6a9-0690-4ac5-be26-d913e1700c0e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Custom analyzers with elasticsearch-php API

2014-06-24 Thread Olivier Revollat

I publish my own answer :)

so following the exemple 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-analyzers.html

you can add custom analyzers simply at the index creation :)

$indexParams['body'] = $this->config['indexsetting'];
$result = $this->client->indices()->create($indexParams);

given that for example, $this->config['indexsetting'] is the following yaml 
(that have to be converted in php array) :

indexsetting:
analysis:
analyzer:
nom_analyzer:
type: custom
tokenizer: standard
filter: [ trim, lowercase, asciifolding ]





Le lundi 23 juin 2014 14:44:49 UTC+2, Olivier Revollat a écrit :
>
> Hello I currently creating an elasticsearch extension for Bolt (a [very 
> cool] symfony2 based CMS).
> For the PHP client i'm using elasticsearch-php.
>
> Great ! I already wrote the code that create the index, mapping and add 
> some data :)
>
> Now I would like to customize the analyzer, so OK in the mapping I could 
> tell (as described in 
> http://www.elasticsearch.org/guide/en/elasticsearch/client/php-api/current/_index_operations.html)
>  
> :
>
>
> $myTypeMapping = array(
> '_source' => array(
> 'enabled' => true
> ),
> 'properties' => array(
> 'first_name' => array(
> 'type' => 'string',
> 'analyzer' => '*whatever_analyzer*'
> ),
> 'age' => array(
> 'type' => 'integer'
> )
> )
> );
>
>
> But I don't really understand* how/where to declare the so called 
> **whatever_analyzer 
> *Instead of the default "standard" analyzer ... can you point me to the 
> example in the documentation ?
>
> For example, if I want this kind of analyzers, how ca I declare this with 
> elasticsearch-php ?
>
> analyzer:
> default_index:
> type: "custom"
> char_filter: html_strip
> tokenizer: "standard" # 
> "my_edge_ngram_tokenizer"
> filter: [ trim, lowercase, stop_fr, 
> fr_stemmer, my_edge_ngram_filter, asciifolding ]
> default_search:
> type: custom
> tokenizer: standard
> filter: [ trim, lowercase, stop_fr, 
> fr_stemmer, asciifolding ]
> mots_clefs:
> type: "custom"
> tokenizer: "keyword"
> filter: [ standard, trim, lowercase, 
> asciifolding ]
> filter:
> my_edge_ngram_filter:
> type: "edgeNGram"
> min_gram: "3"
> max_gram: "20"
> stop_fr:
> type: "stop"
> stopwords: [ _french_ ]
> fr_stemmer:
> type: "stemmer"
> name: "french"
>
>
>
> Thanks :)
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c8528a5d-18e5-4017-9708-329ab133386d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How do a connect as a second client-only Node to remote cluster?

Hi,

I've troubles to connect a second client-only Node via the Java API to a 
remote cluster.

The first client (configured as client-only hence no master and no data) 
runs fine on localhost:9300. However, when the second client-only node 
tries to connect to the cluster an exception gets thrown: 

org.elasticsearch.cluster.block.ClusterBlockException: blocked by: 
[SERVICE_UNAVAILABLE/1/state not recovered / 
initialized];[SERVICE_UNAVAILABLE/2/no master]; 

I use the following code to connect both clients:

Node node = NodeBuilder.nodeBuilder().client(true).node();
Client client = node.client();

GetResponse response = client.prepareGet("twitter", "tweet", 
"1").execute().actionGet();
LOGGER.info(response.getSourceAsString());

// on shutdown
node.close();

While the second client tries to join the cluster, the logs of the first 
client show the following warnings:

o.e.d.z.p.multicast [WARN] [Odin] received ping response 
ping_response{target 
[[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], master 
[[Firebolt][DqdgEEe8RwSyVDUuqmQw4w][p2][inet[/192.168.2.21:9300]]], 
cluster_name[elasticsearch]} with no matching id [1]

I assumed that the first client would be running on localhost:9300 and the 
second one would bind to localhost:9301, but doesn't do that automatically. 
Do I need to specify an additional setting to allow this scenario?

(I also tested by adding a TransportClient instead of a Node and this works 
fine)

JUnit test code and log output can be found at: 
https://gist.github.com/ksmets/bed93778562dd2260e09

Thx, Koen

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/868dc883-dd39-44ac-ab56-1ac609f1f63f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

search across multiple children by single query

2014-06-24 Thread JONBON DASH

Hi,

Anyone plz share a sample search query in json where one parent have 
multiple childs & in single query it can search across both childs?

I can fetch each child's parent information using "has_child". But not able 
to find how to use "has_child" at a time for two different childs in a 
single query.   


Thanks & Regards,
Jonbon Dash




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/db202e75-15c9-42a5-9741-017b52093fe9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-24 Thread Maco Ma

Hi Jörg,

I rerun the benchmark with disabling the _all and codec bloom filter: just 
the index data size got reduced dramatically but ingestion speed is still 
similar as previous:
Number of different meta data field 
ES 
ES with disable _all/codec bloom filter 
Scenario 0: 1000
12secs -> *833*docs/sec
CPU: 30.24%
Heap: 1.08G
time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1
*index size: 36Mb*
iowait: 0.02%
13 secs ->769 docs/sec
CPU: 23.68%
iowait: 0.01%
Heap: 1.31G
Index Size: 248K
Ingestion speed change: 2 1 1 1 1 1 1 1 2 1
Scenario 1: 10k
29secs -> *345*docs/sec
CPU: 40.83%
Heap: 5.74G
time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1
iowait: 0.02%
*Index Size: 36Mb*
31 secs -> 322.6 docs/sec
CPU: 39.29%
iowait: 0.01%
Heap: 47.95G
Index Size: 396K
Ingestion speed change: 12 1 2 1 1 1 2 1 4 2
Scenario 2: 100k
17 mins 44 secs -> *9.4*docs/sec
CPU: 54.73%
Heap: 47.99G
time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40
iowait: 0.02%
*Index Size: 75Mb*
14 mins 24 secs -> 11.6 docs/sec
CPU: 52.30%
iowait: 0.02%
Heap: 47.96G
Index Size: 1.5M
Ingestion speed change: 93 153 151 112 84 65 61 53 51 41

We ingested one single doc once, instead of bulk ingestion, and that was 
our real world requirements.

scripts to disable _all/bloom filer:
curl -XPOST localhost:9200/doc -d '{
  "mappings" : {
  "type" : {
  "_source" : { "enabled" : false },
  "_all" : { "enabled" : false },
  "dynamic_templates" : [
{"t1":{
  "match" : "*_ss",
  "mapping":{
"type": "string",
"store":false,
"norms" : {"enabled" : false}
}
}},
{"t2":{
  "match" : "*_dt",
  "mapping":{
"type": "date",
"store": false
}
}},
{"t3":{
  "match" : "*_i",
  "mapping":{
"type": "integer",
"store": false
}
}}
]
  }
}
  }'


curl -XPUT localhost:9200/doc/_settings -d '{
  "index.codec.bloom.load" :false
}'

Best Regards
Maco

On Monday, June 23, 2014 12:17:27 AM UTC+8, Jörg Prante wrote:
>
> Two things to add, to make Elasticsearch/Solr comparison more fair.
>
> In the ES mapping, you did not disable the _all field.
>
> If you have _all field enabled, all tokens will be indexed twice, one for 
> the field, one for _all.
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html
>
> Also you may want to disable ES codec bloom filter
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-codec.html#bloom-postings
>
> because loading the bloom filter consumes significant memory.
>
> Not sure why you call curl from perl, since this adds overhead. There are 
> nice Solr/ES perl clients to push docs using bulk indexing.
>
> Jörg
>
>
> On Wednesday, June 18, 2014 4:50:13 AM UTC+2, Maco Ma wrote:
>>
>> Hi Mike,
>>
>> new_ES_config.sh(define the templates and disable the refresh/flush):
>> curl -XPOST localhost:9200/doc -d '{
>>   "mappings" : {
>>   "type" : {
>>   "_source" : { "enabled" : false },
>>   "dynamic_templates" : [
>> {"t1":{
>>   "match" : "*_ss",
>>   "mapping":{
>> "type": "string",
>> "store":false,
>> "norms" : {"enabled" : false}
>> }
>> }},
>> {"t2":{
>>   "match" : "*_dt",
>>   "mapping":{
>> "type": "date",
>> "store": false
>> }
>> }},
>> {"t3":{
>>   "match" : "*_i",
>>   "mapping":{
>> "type": "integer",
>> "store": false
>> }
>> }}
>> ]
>>   }
>> }
>>   }'
>>
>> curl -XPUT localhost:9200/doc/_settings -d '{
>>   "index.refresh_interval" : "-1"
>> }'
>>
>> curl -XPUT localhost:9200/doc/_settings -d '{
>>   "index.translog.disable_flush" : true
>> }'
>>
>> new_ES_ingest_threads.pl( spawn 10 threads to use curl command to ingest 
>> the doc and one thread to flush/optimize periodically):
>>
>> my $num_args = $#ARGV + 1;
>> if ($num_args < 1 || $num_args > 2) {
>>   print "\n usuage:$0 [src_dir] [thread_count]\n";
>>   exit;
>> }
>>
>> my $INST_HOME="/scratch/aime/elasticsearch-1.2.1";
>>
>> my $pid = qx(jps | sed -e '/Elasticsearch/p' -n | sed 's/ .*//');
>> chomp($pid);
>> if( "$pid" eq "

Re: Proper parsing of String values like 1m, 1q HOUR etc.

Hi Brian,

Thanks for your reply, I understand your point but if you check the source 
code of TimeValue it does not support the quarter and the year so I was 
wondering what is the class that supports the transformation of the string 
1q into millisecods or 1y into millisecods if any

Thanks

On Tuesday, 17 June 2014 18:31:37 UTC+3, Thomas wrote:
>
> Hi,
>
> I was wondering whether there is a proper Utility class to parse the given 
> values and get the duration in milliseconds probably for values such as 1m 
> (which means 1 minute) 1q (which means 1 quarter) etc.
>
> I have found that elasticsearch utilizes class TimeValue but it only 
> parses up to week, and values such as WEEK, HOUR are not accepted. So is in 
> elasticsearch source any utility class that does the job ? (for Histograms, 
> ranges wherever is needed)
>
> Thank you
> Thomas
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9666f856-7327-4e97-8185-de603f02aee6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elascticsearch scripting

For that specific need you can use the script fields:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-script-fields.html

With something like:

{
  "query": {
"match_all": {}
  },
  "script_fields": {
"_myNewField": {
  "script": "_source._myField"
}
  }
}

Cédric Hourcade
c...@wal.fr


On Tue, Jun 24, 2014 at 9:14 AM, deep saxena  wrote:
> I am looking for the elegant solution where script can serve the purpose.
>
> Some of my conditions are like this :-
>
> changing the name of the field, Some script should run as part of the query
> itself and should return the modified field name.
>
> For example
>
> Data indexed in elasticsearch :-_myField=_myValue
>
> I want to change the _myField to _newField so the query should be
> constructed in such a way that response returned by Elasticsearch should
> have _newField = _myValue in the response object.
>
> I trying to find the script that can run dynamically that changes this
> value, but not able to get that. Can anybody help on this scenarios? :-)
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/fe924064-35e2-4dc4-ac98-485cf322f9b8%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPNfW6_2Ae7YOA9_s8QsR_UMZREWFGn_%2Bpe65h6P%2Bc8ZCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

How to connect Tribe Node with kibana

2014-06-24 Thread Rukshan Kothwala

Hi,

I want to combine different ES cluster and get one kibana dashboard for all 
clusters. As per 
guide, 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/modules-tribe.html
my tribe node is working well. But, When this tribe is pointed to Kibana 
dashboard "no data is displaying". As per logs, tribe node has been 
connected to all clusters.

This is the output when i check, tribe_node:9200/_aliases?pretty
Output is : { }

Can anyone help me to make a centralized kibana interface?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f9f50f73-8a5c-4821-acce-c1559fac09a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: index and search pdf file with elasticsearch php client

2014-06-24 Thread Rajesh Jai

If search is not working. Change this line 
*$params2['body']['query']['text']['file'*] *= 'my words';* as 
*$params2['body']['query']['match']['file'] 
= 'my words'; *

$params2 =array();

$params2['body']['query']['match']['file'] = 'my words';
$params2['body']['highlight']['fields']['file'] = array("term_vector" => 
"with_positions_offsets");
$results = $client->search($params2);
print_r($results);

On Tuesday, 8 April 2014 15:50:45 UTC+5:30, Tanguy Bernard wrote:
>
> I find the answer :
>
> $params2 =array();
>
> $params2['body']['query']['text']['file'] = 'my words';
> $params2['body']['highlight']['fields']['file'] = array("term_vector" => 
> "with_positions_offsets");
> $results = $client->search($params2);
> print_r($results);
>
>
> Le mardi 8 avril 2014 10:22:21 UTC+2, Tanguy Bernard a écrit :
>>
>> Hello,
>> Recently, I find a very helpfull information here :
>> https://gist.github.com/lukas-vlcek/1075067
>>
>> I would like to reproduce the same indexing and searching with php 
>> ElasticSearch client.
>> My indexing seems to work!
>>
>> > require_once 'vendor/autoload.php';
>> $client = new Elasticsearch\Client();
>>
>> $doc_src = "fn6742.pdf";
>> $binary = fread(fopen($doc_src, "r"), filesize($doc_src));
>> $doc_str = base64_encode($binary);
>>
>>
>> $article = array(); 
>> $article['index'] = 'index2';
>> $article['type']  = 'attachment';
>> $article['body']  = array('file' => $doc_str);
>>
>> $result = $client->index($article);
>>
>> ?>
>>
>>
>> But my "search" does not work. I would like to find the sentence where my 
>> world is.
>> I tried this :
>>
>> $params2['body']['query']['match']['file'] = 'my word';
>> $results = $client->search($params2);
>> print_r($results);
>>
>> And I would like something like this :"file" : [ " It'smy word 
>> /You can't use my wordÂ /Â becauseÂ " ]
>>
>>
>> I hope you can help me?
>>
>> Thanks in advance
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c8f2a7cf-8474-430b-a72f-701916909af2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Update existing values within a list or array in Elastic Search

If you send the full document without the element you need to remove in the
array, this should work fine.
How do you actually update your document?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 24 juin 2014 à 09:18:20, Madhumita Sadhukhan (madhumita.sadhuk...@gmail.com)
a écrit:

I have a requirement where I need to update(not append) existing values within
a list or array in Elastic Search.
Is this feature supported in elastic search?

For eg:

I have a field called jobs as part of my document

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53a92849.66334873.950f%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: Mapper Plugin Issues