Re: [ANN] Elasticsearch Smart Chinese Analysis plugin 2.4.1 released

2014-10-10 Thread Bruce Ritchie
Great news, I'm glad that backward compatibility is important :)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/501dc4ed-b630-42ab-9eb6-acbba2eef945%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: GC issue

2014-10-10 Thread Bruce Ritchie
Young generation GC happens a lot normally and it's normally not a concern 
as it takes so little time per GC cycle. In my experience it's only a 
concern if it's happening many many times per second which often indicates 
too small a young generation.
>
>
It's the old generation GC cycles that you have to concern yourself with as 
those are the ones that may (and eventually *will* with CMS) take many 
seconds to complete.


All the best,

Bruce

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9bdfe5f4-b31e-4694-9be3-e975d454b8e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: conditions in script fields

2014-06-25 Thread Bruce Ritchie
Mohit,

I do conditional processing in script fields in my code, so yes, very 
possible. Elasticsearch by default uses mvel for scripting which has full 
flow control support. See http://mvel.codehaus.org/MVEL+2.0+Control+Flow 
for details.

Bruce

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/82759964-12e7-44c2-8d64-b0e0bf9824ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: G1 Garbage Collector with Elasticsearch >= 1.1

2014-06-24 Thread Bruce Ritchie
We use G1GC for tomcat and mule in production but not for ES. We have found 
that G1GC is more 'stable' in terms of pause times at the cost of more 
overhead and thus less throughput. No GC algorithm will help you though if 
you have a memory leak or your vm is under extreme memory pressure.

For really large heaps I would suggest taking a look at Azul's vm. It's not 
cheap but it pretty much guarantees no pause times any heap size. I don't 
know at what overhead cost though.


Bruce

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/19f488c4-5493-4bfa-83ea-aad7ce05fe3a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES v1.1 continuous young gc pauses old gc, stops the world when old gc happens and splits cluster

2014-06-24 Thread Bruce Ritchie
You may want to try upgrading ES - release notes for 1.2.0 indicate a 
change wrt throttling indexing when merges fall behind and earlier release 
notes post 1.1.0 have notes about a potential memory leak fix among many 
other improvements and fixes.

Best I can think of :|


Bruce

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/58147218-c37f-467a-bdd6-3d7457b8dabd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


boolean multi-field silently ignored in 1.2.1

2014-06-20 Thread Bruce Ritchie
I'm seeing multi-fields of type boolean silently being reduced to a normal 
boolean field in 1.2.1 which wasn't the behavior in 0.90.9. 
See https://gist.github.com/Omega359/0c2a93690b4db30693a1 for an example of 
this.

Is this expected? To me it seems like it should work - the boolean field 
mapper seems to be calling out to multiFieldsBuilder - but I'm not versed 
enough in the internals of ES to know where if at all it's broken.


Bruce

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ccc5b263-24a2-45c5-97d1-46a93799eb58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES v1.1 continuous young gc pauses old gc, stops the world when old gc happens and splits cluster

2014-06-19 Thread Bruce Ritchie
Java 8 with G1GC perhaps? It'll have more overhead but perhaps it'll be 
more consistent wrt pauses.



On Wednesday, June 18, 2014 2:02:24 PM UTC-4, Eric Brandes wrote:
>
> I'd just like to chime in with a "me too".  Is the answer just more 
> nodes?  In my case this is happening every week or so.
>
> On Monday, April 21, 2014 9:04:33 PM UTC-5, Brian Flad wrote:
>
> My dataset currently is 100GB across a few "daily" indices (~5-6GB and 15 
> shards each). Data nodes are 12 CPU, 12GB RAM (6GB heap).
>
>
> On Mon, Apr 21, 2014 at 6:33 PM, Mark Walkom  
> wrote:
>
> How big are your data sets? How big are your nodes?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 22 April 2014 00:32, Brian Flad  wrote:
>
> We're seeing the same behavior with 1.1.1, JDK 7u55, 3 master nodes (2 min 
> master), and 5 data nodes. Interestingly, we see the repeated young GCs 
> only on a node or two at a time. Cluster operations (such as recovering 
> unassigned shards) grinds to a halt. After restarting a GCing node, 
> everything returns to normal operation in the cluster.
>
> Brian F
>
>
> On Wed, Apr 16, 2014 at 8:00 PM, Mark Walkom  
> wrote:
>
> In both your instances, if you can, have 3 master eligible nodes as it 
> will reduce the likelihood of a split cluster as you will always have a 
> majority quorum. Also look at discovery.zen.minimum_master_nodes to go with 
> that.
> However you may just be reaching the limit of your nodes, which means the 
> best option is to add another node (which also neatly solves your split 
> brain!).
>
> Ankush it would help if you can update java, most people recommend u25 but 
> we run u51 with no problems.
>
>
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 17 April 2014 07:31, Dominiek ter Heide  wrote:
>
> We are seeing the same issue here. 
>
> Our environment:
>
> - 2 nodes
> - 30GB Heap allocated to ES
> - ~140GB of data
> - 639 indices, 10 shards per index
> - ~48M documents
>
> After starting ES everything is good, but after a couple of hours we see 
> the Heap build up towards 96% on one node and 80% on the other. We then see 
> the GC take very long on the 96% node:
>
>
>
>
>
>
>
>
>
> TOuKgmlzaVaFVA][elasticsearch1.trend1.bottlenose.com][inet[/192.99.45.125:
> 9300]]])
>
> [2014-04-16 12:04:27,845][INFO ][discovery] 
> [elasticsearch2.trend1] trend1/I3EHG_XjSayz2OsHyZpeZA
>
> [2014-04-16 12:04:27,850][INFO ][http ] [
> elasticsearch2.trend1] bound_address {inet[/0.0.0.0:9200]}, 
> publish_address {inet[/192.99.45.126:9200]}
>
> [2014-04-16 12:04:27,851][INFO ][node ] 
> [elasticsearch2.trend1] started
>
> [2014-04-16 12:04:32,669][INFO ][indices.store] 
> [elasticsearch2.trend1] updating indices.store.throttle.max_bytes_per_sec 
> from [20mb] to [1gb], note, type is [MERGE]
>
> [2014-04-16 12:04:32,669][INFO ][cluster.routing.allocation.decider] 
> [elasticsearch2.trend1] updating 
> [cluster.routing.allocation.node_initial_primaries_recoveries] from [4] 
> to [50]
>
> [2014-04-16 12:04:32,670][INFO ][indices.recovery ] 
> [elasticsearch2.trend1] updating [indices.recovery.max_bytes_per_sec] from 
> [200mb] to [2gb]
>
> [2014-04-16 12:04:32,670][INFO ][cluster.routing.allocation.decider] 
> [elasticsearch2.trend1] updating 
> [cluster.routing.allocation.node_initial_primaries_recoveries] from [4] 
> to [50]
>
> [2014-04-16 12:04:32,670][INFO ][cluster.routing.allocation.decider] 
> [elasticsearch2.trend1] updating 
> [cluster.routing.allocation.node_initial_primaries_recoveries] from [4] 
> to [50]
>
> [2014-04-16 15:25:21,409][WARN ][monitor.jvm  ] 
> [elasticsearch2.trend1] [gc][old][11876][106] duration [1.1m], 
> collections [1]/[1.1m], total [1.1m]/[1.4m], memory [28.7gb]->[22gb]/[
> 29.9gb], all_pools {[young] [67.9mb]->[268.9mb]/[665.6mb]}{[survivor] [
> 60.5mb]->[0b]/[83.1mb]}{[old] [28.6gb]->[21.8gb]/[29.1gb]}
>
> [2014-04-16 16:02:32,523][WARN ][monitor.jvm  ] [
> elasticsearch2.trend1] [gc][old][13996][144] duration [1.4m], collections 
> [1]/[1.4m], total [1.4m]/[3m], memory [28.8gb]->[23.5gb]/[29.9gb], 
> all_pools {[young] [21.8mb]->[238.2mb]/[665.6mb]}{[survivor] [82.4mb]->[0b
> ]/[83.1mb]}{[old] [28.7gb]->[23.3gb]/[29.1gb]}
>
> [2014-04-16 16:14:12,386][WARN ][monitor.jvm  ] [
> elasticsearch2.trend1] [gc][old][14603][155] duration [1.3m], collections 
> [2]/[1.3m], total [1.3m]/[4.4m], memory [29.2gb]->[23.9gb]/[29.9gb], 
> all_pools {[young] [289mb]->[161.3mb]/[665.6mb]}{[survivor] [58.3mb]->[0b
> ]/[83.1mb]}{[old] [28.8gb]->[23.8gb]/[29.1gb]}
>
> [2014-04-16 16:17:55,480][WARN ][monitor.jvm  ] [
> elasticsearch2.trend1] [gc][old][14745][158] duration [1.3m], collections 
> [1]/[1.3m], total [1.3m]/[5.7m], memory [29.7gb]->[24.1gb]/[29.9g

Re: Highlighting field order not preserved in 1.2.1

2014-06-18 Thread Bruce Ritchie
Thanks for the reply. I'll just have to checkout the 1.2.1 tag and patch it 
myself till 1.3.0 is out.

Bruce

On Wednesday, June 18, 2014 11:58:21 AM UTC-4, Nikolas Everett wrote:
>
> That'll be fixed in 1.3.0.  I also needed the ordering to be consistent to 
> implement a similar trick to yours in the experimental highlighter.  
>
> Nik
>
>
> On Wed, Jun 18, 2014 at 11:47 AM, Bruce Ritchie  > wrote:
>
>> All,
>>
>> I noticed what I think is a regression in 1.2.1 from 0.90.9 where the 
>> order of fields for highlighting is not preserved as it was in 0.90.9
>>
>> For example, with a query where highlighting is defined such as
>>
>> "highlight" : {
>> "require_field_match" : true,
>> "type" : "mycustomhighlighter",
>> "options" : {
>>   "number_of_fragments" : 0,
>>   "fragment_size" : 0
>> },
>> "fields" : {
>>   "field1" : { },
>>   "field2" : { },
>>   "field3" : { },
>>   "field4" : { },
>>   "field5" : { },
>>   "field6" : { }
>> }
>>
>> In 0.90.9 the order in which fields would be highlighted would be field1, 
>> field2, field3, field4, etc. In 1.2.1 it's arbitrary. I believe the cause 
>> is the use of a HashMap vs using a LinkedHashMap in the constructor 
>> of SearchContextHighlighter.java causing the insertion order to be lost.
>>
>> The reason why I want order preserved is that I have a custom wrapping 
>> highlighter to FVH that highlights only the first n fields (the ones most 
>> likely to have hihglighting) and will only continue highlighting the 
>> remaining fields (potentially hundreds) if no highlighting happened for the 
>> initial set of fields. Doing this allows searches to execute very quickly 
>> on average and only occasionally taking the hit of attempting to highlight 
>> everything resulting in a 8-10x faster query on average.
>>
>>
>> All the best,
>>
>> Bruce Ritchie
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/919a7228-0cfe-42a6-a542-be8c83fbe044%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/919a7228-0cfe-42a6-a542-be8c83fbe044%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4e219003-11c5-41bf-b465-88457cc97843%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Highlighting field order not preserved in 1.2.1

2014-06-18 Thread Bruce Ritchie
All,

I noticed what I think is a regression in 1.2.1 from 0.90.9 where the order 
of fields for highlighting is not preserved as it was in 0.90.9

For example, with a query where highlighting is defined such as

"highlight" : {
"require_field_match" : true,
"type" : "mycustomhighlighter",
"options" : {
  "number_of_fragments" : 0,
  "fragment_size" : 0
},
"fields" : {
  "field1" : { },
  "field2" : { },
  "field3" : { },
  "field4" : { },
  "field5" : { },
  "field6" : { }
}

In 0.90.9 the order in which fields would be highlighted would be field1, 
field2, field3, field4, etc. In 1.2.1 it's arbitrary. I believe the cause 
is the use of a HashMap vs using a LinkedHashMap in the constructor 
of SearchContextHighlighter.java causing the insertion order to be lost.

The reason why I want order preserved is that I have a custom wrapping 
highlighter to FVH that highlights only the first n fields (the ones most 
likely to have hihglighting) and will only continue highlighting the 
remaining fields (potentially hundreds) if no highlighting happened for the 
initial set of fields. Doing this allows searches to execute very quickly 
on average and only occasionally taking the hit of attempting to highlight 
everything resulting in a 8-10x faster query on average.


All the best,

Bruce Ritchie

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/919a7228-0cfe-42a6-a542-be8c83fbe044%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Elasticsearch experimental highlighter

2014-05-29 Thread Bruce Ritchie
Hi Nikolas,

I'm likely to test this in the next couple of weeks (I'm still on 0.90.9) 
however I've a question on performance. 'Its pretty quick' meaning 
comparable performance to the posting highlighter, the fast vector 
highlighter, or just quick enough for your use case?

The reason why I'm asking is because highlighting performance is the 
largest issue I face currently. Our documents have hundreds of very short 
fields (well over a thousand if you count the sub fields in a multi-field 
field) and listing every field/sub field to highlight causes queries to be 
10-20x slower than highlighting just a single field (100ms -> 2100ms for 
example). I can't use the _all field because I need to know the actual 
field that was highlighted and only the fvh highlighter returns the high 
quality results we need. I'm actually toying with the idea of doing a 
two-phase search where the first phase only highlights a few fields that 
commonly hit with a second phase that only searches the remaining hits that 
didn't highlight on the first pass. That approach may work but I'd rather 
just have a highlighter that was faster :) 


All the best,

Bruce Ritchie



On Thursday, April 10, 2014 4:04:57 PM UTC-4, Nikolas Everett wrote:
>
> I've been working on a new highlighter on and off for a few weeks and I'd 
> love for other folks to try it out: 
> https://github.com/wikimedia/search-highlighter
>
> You should try it because:
> 1.  Its pretty quick.
> 2.  It supports many of the features of the other highlighters and lets 
> you combine them in new ways.
> 3.  Has a few tricks that none other highlighters have.
> 4.  It doesn't require that you store any extra data information but will 
> use what it can to speed itself up.
>
> I've installed it on our beta site 
> <http://simple.wikipedia.beta.wmflabs.org/w/index.php?title=Special%3ASearch&profile=default&search=chess+players&fulltext=Search>
>  
> so you can run see it in action without installing it.  
>
> Let me expand on my list above:
> It doesn't require any extra data and is nice and fast that way for short 
> fields.  Once fields get longer [0] reanalyzing them starts to take too 
> long so it is best to store offsets in the postings just like the postings 
> highlighter.  It can use term vectors the same way that the fast vector 
> highlighter can but that is slower than postings and takes up more space.
>
> It supports three fragmenters: one that mimics the postings highlighter, 
> one that mimics the fast vector highlighter, and one that always highlights 
> the whole value.
>
> It supports matched_fields, no_match_size, and most everything else in the 
> highlight api.  It doesn't support require_field_match though.
>
> It adds a handful of tricks like returning the top scoring snippets in 
> document order and weighing terms that appear early in the document 
> higher.  Nothing difficult, but still cute tricks.  Its reasonably easy to 
> implement new tricks so if you have any ideas I'd love to hear them.
>
> I don't think it is really ready for production usage yet but I'd like to 
> get there in a week or two.
>
> Thanks for reading,
>
> Nik
>
> [0]: I haven't done the measurements to figure out how long the field has 
> to be before it is faster to use postings then reanalyze it.  I did the 
> math a few months ago for how long the field has to be before vectors 
> become faster.  It was a couple of KB for my analysis chain but I'm not 
> sure any of that holds true for this highlighter.  It could be more or less.
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7b125714-48dd-4bca-a58d-d56acac94d47%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: script sorting issue 0.90.9

2014-03-03 Thread Bruce Ritchie
Bihn,

That worked, thanks. I had been going by the fields outlined 
in 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules-scripting.html#_document_fields
 
- I guess mvel doesn't quite handle the 'isXYZ' javabean convention 
correctly.

Bruce

On Monday, March 3, 2014 2:12:22 PM UTC-5, Binh Ly wrote:
>
> Try:
>
> doc["formatted_name__v.plain"].isEmpty()
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/47ef9357-4961-4b5b-b701-9be3eb8bbdff%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


script sorting issue 0.90.9

2014-03-03 Thread Bruce Ritchie
Hi,

I've been attempting to get custom sorting working for a scenario I have 
without success. What I'm trying to do is sort by fieldA if it exists and 
if not then use fieldB for the sort. The first thing I tried was :

requestBuilder.addSort(SortBuilders.fieldSort(fieldA)).order(sortOrder).ignoreUnmapped(true));
requestBuilder.addSort(SortBuilders.fieldSort(fieldB)).order(sortOrder).ignoreUnmapped(true));

However this doesn't seem to work as expected - rather it sorts results by 
fieldA (A-Z) *then* by fieldB (A-Z). The two fields are mutually exclusive 
- either one exists or the other, never both in the same doc.

The next thing I tried was to use a sort script and this is where I've hit 
an issue that I can't seem to overcome.

I'm using the following script:

if (!(doc["formatted_name__v.plain"].empty)) { 
doc["formatted_name__v.plain"].value; } else if 
(!(doc["corporate_name__v.plain"].empty)) { 
doc["corporate_name__v.plain"].value; } else { ""; }

being built with the following code:

ScriptSortBuilder sortBuilder = 
SortBuilders.scriptSort(script.toString(), 
"string").lang("mvel").order(sortOrder);
requestBuilder.addSort(sortBuilder);

and ES is complaining about a compile error

Caused by: [Error: unexpected token in constructor]
[Near : {... tted_name__v.plain"].empty)) { doc["formatted_name }]
 ^
[Line: 1, Column: 37]
at 
org.elasticsearch.common.mvel2.ast.TypeDescriptor.updateClassName(TypeDescriptor.java:72)
at 
org.elasticsearch.common.mvel2.ast.TypeDescriptor.(TypeDescriptor.java:50)
at 
org.elasticsearch.common.mvel2.compiler.AbstractParser.nextToken(AbstractParser.java:1042)
at 
org.elasticsearch.common.mvel2.compiler.ExpressionCompiler._compile(ExpressionCompiler.java:128)
at 
org.elasticsearch.common.mvel2.util.ParseTools.subCompileExpression(ParseTools.java:2115)
at 
org.elasticsearch.common.mvel2.ast.Negation.(Negation.java:40)
 ^

The arrow in the output seems to point to the 'o' in the { doc[.. part.

Does anyone see anything obviously wrong with my expression?


Bruce


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/02262056-e7c1-4cc7-81dc-19f0a5bb8514%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.