from:"Paul"

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2015-04-08 Thread Paul Chua

I'm having an issue very similar to this; I'm not sure exactly what you did
to get the array contents. I've made a new post here:
https://groups.google.com/forum/#!topic/elasticsearch/MpOqKthgqtA

-- 
Paul Chua
Data Scientist
317-979-5643

[image: cid:02526A0B-9444-47C7-A3EC-12B05A02CB50]

*We help dealers Find, Engage and Market to Automotive Shoppers more
effectively.*

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CANFNKHYEBhZ8%3DDu2RSNSe6kO%3D%2BQEMVs6qc9XpkNpL0UiBPpEzw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Getting contents of org.elasticsearch.hadoop.mr.WritableArrayWritable in scala

2015-04-08 Thread Paul Chua

I'm having an issue very similar to this one, but I'm not sure exactly what 
they did to get the array contents. I can't apply the toStrings() function 
to the arraywritable because it's in a writable. 
http://grokbase.com/t/gg/elasticsearch/14c2sb14rk/hadoop-getting-specified-fields-from-elasticsearch-as-an-input-to-a-mapreduce-job/14c3dz4san#14c3dz4san

I've got a map from document IDs to another map from fields to their 
contents. I'm interested in getting the contents of the concepts field, 
which is an array of tuples. My goal is to use this in a recommender 
system using spark. I'm working in scala.

Looking up a document ID gives me a map from the field name concepts to the 
contents. I want the contents of the concepts field in the form of a map 
from text to relevance. I'd rather not save this as a string and parse it 
for efficiency reasons.

scala> map("396974")
warning: there were 1 deprecation warning(s); re-run with -deprecation for 
detai
ls
res34: org.apache.hadoop.io.MapWritable = {concepts=[{text=Jujutsu, 
relevance=0.
953505}, {text=Aikido, relevance=0.87238}, {text=Martial arts, 
relevance=0.56092
8}, {text=Morihei Ueshiba, relevance=0.535638}, {text=Mixed martial arts, 
releva
nce=0.470634}, {text=Grappling, relevance=0.43726}, {text=Karate, 
relevance=0.43
5142}, {text=Brazilian Jiu-Jitsu, relevance=0.403991}]}

scala> map("396974").get(new Text("concepts"))
warning: there were 1 deprecation warning(s); re-run with -deprecation for 
detai
ls
res35: org.apache.hadoop.io.Writable = 
org.elasticsearch.hadoop.mr.WritableArray
Writable@1077a7

 My understanding is that this is a writable wrapper around an 
arraywritable object. I want to get the contents of this object as a map. 
Ultimately, I want is a map from document IDs to the corresponding concept 
map from text to relevance. The thread linked above suggests calling 
toStrings on the writablearraywritable object, but this doesn't work:

scala> map("396974").get(new Text("concepts")).toStrings
:41: error: value toStrings is not a member of 
org.apache.hadoop.io.Wri
table
  map("396974").get(new Text("concepts")).toStrings
   ^

Any advice will be greatly appreciated.

Thanks,
Paul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/239004a9-81ed-4aef-9a19-35a64655fa50%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Is there any documentation on using ctx in scripts?

2015-03-25 Thread Paul G


Hi,

  I'm looking for documentation on using the ctx object in transform 
scripts.  I've used ctx_source before but now I need to do more complex 
transformations and add a script to the MongoDB river.  On this page I see 
"ctx.op":
http://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html

  ...and on this page about adding scripts to the MongoDB river are 
"ctx.ignore" and "ctx.deleted" (under "script filters" at the bottom of the 
page):
https://github.com/richardwilly98/elasticsearch-river-mongodb/wiki

  Where can I find documentation about the properties of the ctx object?  I 
looked through the Java API to see if I could find something in the source 
code that might be helpful but was unable to find anything.

Thank you,

Paul


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bd65ccd6-4c0b-42b8-b696-392e34b48750%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Order of the properties gets changed if "_source":{ "exclude": ["title"] } is used in a query.

2015-03-22 Thread Debashish Paul

Hi,

I am facing a problem with using  "_source":{ "exclude": ["title"] } in my 
elastic query. Whenever I use this, my properties in the "_source" get 
shuffled. What am I doing wrong?

Example: 

Without "_source":{ "exclude": ["title"] }
-
"_source":{
  "title" : ...
  "prop1" : ...
  "prop2" : ...
  "prop3" : ...
  "prop4" : ...
}

With "_source":{ "exclude": ["title"] }
-
"_source":{
  "prop3" : ...
  "prop1" : ...
  "prop4" : ...
  "prop2" : ...
}


Expected:
-
"_source":{
  "prop1" : ...
  "prop2" : ...
  "prop3" : ...
  "prop4" : ...
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/64c384c4-aae6-41f0-9ab5-4744015c55b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

"_source":{ "exclude": ["title"] }

2015-03-22 Thread Debashish Paul

Hi,

I am facing a problem with using  "_source":{ "exclude": ["title"] } in my 
elastic query. Whenever I use this, my properties in the "_source" get 
shuffled. What am I doing wrong?

Example: 

Without "_source":{ "exclude": ["title"] }
-
"_source":{
  "title" : ...
  "prop1" : ...
  "prop2" : ...
  "prop3" : ...
  "prop4" : ...
}

Without "_source":{ "exclude": ["title"] }
-
"_source":{
  "prop3" : ...
  "prop1" : ...
  "prop4" : ...
  "prop2" : ...
}


Expected:
-
"_source":{
  "prop1" : ...
  "prop2" : ...
  "prop3" : ...
  "prop4" : ...
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d31202f2-75d8-4232-ae25-1dd4acbcae95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: EC2 cluster storage question

2015-02-25 Thread Paul Sanwald

Thanks, the rsync to EBS is what I was rolling around in my head, but 
wasn't sure if it was a dumb idea.

We used to use Elastic Block Store, but have gotten incredible performance 
gains from moving to SSD local storage. The ES team doesn't recommend any 
kind of NAS 
<http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/hardware.html>,
 
and they re-iterated in their recent webinar that they couldn't really 
recommend EBS. This was exactly in line with our experience: it will work, 
but performance is less predictable and certainly degraded from ephermal 
storage.

Sounds like I have two options:
1 - shutdown and just restore from snapshot when we start back up.
2 - sync local storage to EBS when we shutdown, and the reverse when we 
start up.

Not sure if the juice is going to be worth the squeeze for either of these 
options, but I appreciate everyone's thoughts.

Thanks!

--paul

On Wednesday, February 25, 2015 at 2:15:01 AM UTC-5, Norberto Meijome wrote:
>
> OP points out he is using ephemeral storage...hence shutdown will destroy 
> the data...but it can be rsynced to EBS as part of the shutdown 
> process...and then repeat in reverse when starting things up again...
>
> Though I guess you could let ES take care of it by tagging nodes 
> accordingly and updating the index settings .(hope it makes sense...)
> On 25/02/2015 4:58 pm, "Mark Walkom" > 
> wrote:
>
>> Why not just shut the cluster down, disable allocation first and then 
>> just gracefully power things off?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_D15Aq62TzhbTN8kWKDPGpsuoYP2e2RJta9N5_tu4_ZA%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_D15Aq62TzhbTN8kWKDPGpsuoYP2e2RJta9N5_tu4_ZA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
-- 
*Important Notice:*  The information contained in or attached to this email 
message is confidential and proprietary information of RedOwl Analytics, 
Inc., and by opening this email or any attachment the recipient agrees to 
keep such information strictly confidential and not to use or disclose the 
information other than as expressly authorized by RedOwl Analytics, Inc. 
 If you are not the intended recipient, please be aware that any use, 
printing, copying, disclosure, dissemination, or the taking of any act in 
reliance on this communication or the information contained herein is 
strictly prohibited. If you think that you have received this email message 
in error, please delete it and notify the sender.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8f970a6d-806a-4290-9cb8-1f54217a8ed8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

EC2 cluster storage question

2015-02-24 Thread Paul Sanwald

More detail below, but the the crux of my question is: What's the best way
to spin up/down "on demand" an ES cluster on EC2 that uses ephermal local
storage? Essentially, I want to run the cluster during the week and spin
down over the weekend. Other than brute force snapshot/restore, is there
any more creative way to do this, like mirroring local storage to EBS or
similar?

Some more background:
We run multiple ES clusters on ec2 (we use opsworks for deployment
automation). We started out several years back using EBS because we didn't
know any better, and have switched over to using SSD based local storage.
The performance improvements have been unbelievable.

Obviously, using ephermal local storage comes at a cost: we use
replication, take frequent snapshots, and store all source data to mitigate
the risk of data loss. the other thing that local storage means is that our
cluster essentially needs to be up and running 24/7, which I think is a
fairly normal.

I'm investigating some ways to save on cost for a large-ish cluster, and
one of the things is that we don't need it to necessarily run 24/7;
specifically, we want to turn the cluster off over the weekend. That said,
restoring terabytes from snapshot doesn't seem like a very efficient way to
do this, so I want to consider options, and was hoping the community could
help me in identifying options that I am missing.

thanks in advance for any thoughts you may have.

--paul

--
*Important Notice:* The information contained in or attached to this email
message is confidential and proprietary information of RedOwl Analytics,
Inc., and by opening this email or any attachment the recipient agrees to
keep such information strictly confidential and not to use or disclose the
information other than as expressly authorized by RedOwl Analytics, Inc.
If you are not the intended recipient, please be aware that any use,
printing, copying, disclosure, dissemination, or the taking of any act in
reliance on this communication or the information contained herein is
strictly prohibited. If you think that you have received this email message
in error, please delete it and notify the sender.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/588e485c-6029-4ded-a3ce-a8dd01213510%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Combining Multiple Queries with 'OR' or 'AND'

2015-02-19 Thread Debashish Paul



Hi,
Question

I am trying to combine 2 user search queries using AND and OR as operation. 
I am aware of combining queries where we can merge filters, but I want to 
merge entire queries like { {BIG Elastic Query1} AND {BIG Elastic Query2} }.
Details

For instance say a user performs a search for "batman in movies type with 
filters of Christian and Bale" and another query "Dark Knight in type 
tvshows with filters of Christopher Nolan". I want to combine both queries 
so I can look for both batman movies and Dark Knight tvshows, but not Dark 
knight movies or batman tvshows.

For example, for the given queries
I just want to run Query1 OR Query2 in the elasticsearch.
Query 1:

{
   "query": {
  "filtered": {
 "query": { 
"query_string":{  
   "query":"Batman",
   "default_operator":"AND",
   "fields":[  
  "Movies._all"
   ]
}
 },
 "filter": {
"bool": {
   "must": [  
  {  
 "query":{  
"filtered":{  
   "filter":{  
  "and":[  
 {  
"term":{  
   "cast.firstName":"Christian"
}
 },
 {  
"term":{  
   "cast.lastName":"Bale"
}
 }
  ]
   }
}
 }
  }
   ]
}
 }
  }
   }
}

Query2:

{
   "query": {
  "filtered": {
 "query": { 
"query_string":{  
   "query":"Dark Knight",
   "default_operator":"AND",
   "fields":[  
  "tvshows._all"
   ]
}
 },
 "filter": {
"bool": {
   "must": [  
  {  
 "query":{  
"filtered":{  
   "filter":{  
  "and":[  
 {  
"term":{  
   "director.firstName":"Christopher"
}
 },
 {  
"term":{  
   "director.lastName":"Nolan"
}
 }
  ]
   }
}
 }
  }
   ]
}
 }
  }
   }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/be301214-1b6a-4065-8c8e-f1e5048e9873%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: php api and self signed certs

2015-02-11 Thread Paul Halliday

That is the hint I needed.

I had it w/o quotes at first and the page was failing. I just assumed this 
was because of the lack of quotes, which once added allowed the page to 
load. I should have looked at the error console, which was showing the 
missing include :)

All is good and working as expected.

On Tuesday, February 10, 2015 at 11:17:20 PM UTC-4, Ian MacLennan wrote:
>
> I suspect that it is because you put your constant names in quotes, and as 
> a result the keys are going to be wrong.  i.e. you 
> have 'CURLOPT_SSL_VERIFYPEER' => true, while it should probably be 
> CURLOPT_SSL_VERIFYPEER => true as shown in the link you provided.
>
> Ian
>
> On Tuesday, February 10, 2015 at 10:42:05 AM UTC-5, Paul Halliday wrote:
>>
>> Well, I got it working but had to make the declarations in CurlHandle.php
>>
>> I added:
>>
>> $curlOptions[CURLOPT_SSL_VERIFYPEER] = true;
>> $curlOptions[CURLOPT_CAINFO] = '/full/path/to/cacert.pem';
>>
>> directly above:
>>
>> curl_setopt_array($handle, $curlOptions);
>> return new static($handle, $curlOptions);
>> ...
>>
>> What's strange is that I also tried placing them in the default 
>> $curlOptions but those seemed to get dumped somewhere prior to 
>> curl_setopt_array()
>>
>> Yucky, but working.
>>
>>
>> On Monday, February 9, 2015 at 7:05:34 PM UTC-4, Paul Halliday wrote:
>>>
>>> Hi,
>>>
>>> I am trying to get the configuration example from here to work:  
>>>
>>>
>>> http://www.elasticsearch.org/guide/en/elasticsearch/client/php-api/current/_security.html#_guzzleconnection_self_signed_certificate
>>>
>>> My settings look like this:
>>>  
>>>  19 // Elasticsearch
>>>  20 $clientparams = array();
>>>  21 $clientparams['hosts'] = array(
>>>  22 'https://host01:443'
>>>  23 );
>>>  24 
>>>  25 $clientparams['guzzleOptions'] = array(
>>>  26 '\Guzzle\Http\Client::SSL_CERT_AUTHORITY' => 'system',
>>>  27 '\Guzzle\Http\Client::CURL_OPTIONS' => [
>>>  28'CURLOPT_SSL_VERIFYPEER' => true,
>>>  29 'CURLOPT_SSL_VERIFYHOST' => 2,
>>>  30 'CURLOPT_CAINFO' => '.inc/cacert.pem',
>>>  31 'CURLOPT_SSLCERTTYPE' => 'PEM',
>>>  32 ]
>>>  33 );
>>>
>>> The error:
>>>
>>> PHP Fatal error:  Uncaught exception 
>>> 'Elasticsearch\\Common\\Exceptions\\TransportException' with message 'SSL 
>>> certificate problem: self signed certificate'
>>>
>>> What did I miss?
>>>
>>> Thanks!
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a83720d1-5840-48c3-a952-ea6c38619ba5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: php api and self signed certs

2015-02-10 Thread Paul Halliday

Well, I got it working but had to make the declarations in CurlHandle.php

I added:

$curlOptions[CURLOPT_SSL_VERIFYPEER] = true;
$curlOptions[CURLOPT_CAINFO] = '/full/path/to/cacert.pem';

directly above:

curl_setopt_array($handle, $curlOptions);
return new static($handle, $curlOptions);
...

What's strange is that I also tried placing them in the default 
$curlOptions but those seemed to get dumped somewhere prior to 
curl_setopt_array()

Yucky, but working.


On Monday, February 9, 2015 at 7:05:34 PM UTC-4, Paul Halliday wrote:
>
> Hi,
>
> I am trying to get the configuration example from here to work:  
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/client/php-api/current/_security.html#_guzzleconnection_self_signed_certificate
>
> My settings look like this:
>  
>  19 // Elasticsearch
>  20 $clientparams = array();
>  21 $clientparams['hosts'] = array(
>  22 'https://host01:443'
>  23 );
>  24 
>  25 $clientparams['guzzleOptions'] = array(
>  26 '\Guzzle\Http\Client::SSL_CERT_AUTHORITY' => 'system',
>  27 '\Guzzle\Http\Client::CURL_OPTIONS' => [
>  28'CURLOPT_SSL_VERIFYPEER' => true,
>  29 'CURLOPT_SSL_VERIFYHOST' => 2,
>  30 'CURLOPT_CAINFO' => '.inc/cacert.pem',
>  31 'CURLOPT_SSLCERTTYPE' => 'PEM',
>  32 ]
>  33 );
>
> The error:
>
> PHP Fatal error:  Uncaught exception 
> 'Elasticsearch\\Common\\Exceptions\\TransportException' with message 'SSL 
> certificate problem: self signed certificate'
>
> What did I miss?
>
> Thanks!
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4dfff5aa-8040-44d5-af8e-5c2380c4ac55%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

php api and self signed certs

2015-02-09 Thread Paul Halliday

Hi,

I am trying to get the configuration example from here to work:  

http://www.elasticsearch.org/guide/en/elasticsearch/client/php-api/current/_security.html#_guzzleconnection_self_signed_certificate

My settings look like this:
 
 19 // Elasticsearch
 20 $clientparams = array();
 21 $clientparams['hosts'] = array(
 22 'https://host01:443'
 23 );
 24 
 25 $clientparams['guzzleOptions'] = array(
 26 '\Guzzle\Http\Client::SSL_CERT_AUTHORITY' => 'system',
 27 '\Guzzle\Http\Client::CURL_OPTIONS' => [
 28'CURLOPT_SSL_VERIFYPEER' => true,
 29 'CURLOPT_SSL_VERIFYHOST' => 2,
 30 'CURLOPT_CAINFO' => '.inc/cacert.pem',
 31 'CURLOPT_SSLCERTTYPE' => 'PEM',
 32 ]
 33 );

The error:

PHP Fatal error:  Uncaught exception 
'Elasticsearch\\Common\\Exceptions\\TransportException' with message 'SSL 
certificate problem: self signed certificate'

What did I miss?

Thanks!


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/85316e18-cc6b-49b1-8f72-6d0476cfd166%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Possible? Wildcard template for a collection of fields to solve some dynamic mapping woes

2015-02-09 Thread Paul Kavanagh

I think you have something there. I have come up with this:

curl -XPUT localhost:9200/_template/template_1 -d '
{
"template" : "logstash-*",
"order" : 0,
"settings" : {
"number_of_shards" : 15
},
"mappings" : {
  "dynamic_templates":[
{"apiservice_logstash":{
"match":"apiservice.logstash.@fields.parameters.*",
"match_mapping_type":"dateOptionalTime",
"mapping":{
  "type":"string",
  "analyzer":"english"
}
  }
}
  ]
}
}
'

However... When I try to post it, Elasticsearch throws:
{"error":"ElasticsearchIllegalArgumentException[Malformed mappings section 
for type [dynamic_templates], should include an inner object describing the 
mapping]","status":400}

i've tried a few things, but it doesn't seem to like my mappings block for 
some reason.

Any idea why?


On Friday, February 6, 2015 at 11:41:49 AM UTC, Itamar Syn-Hershko wrote:
>
> You mean something like dynamic templates? 
> http://code972.com/blog/2015/02/81-elasticsearch-one-tip-a-day-using-dynamic-templates-to-avoid-rigorous-mappings
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Lucene.NET committer and PMC member
>
> On Fri, Feb 6, 2015 at 1:39 PM, Paul Kavanagh  > wrote:
>
>> Hi all,
>> We're having a MapperParsingException problem with some field values when 
>> we get when we use the JSON Filter for Logstash to explode out a JSON 
>> document to Elasticsearch fields.
>>
>> In 99.9% of cases, certain of these fields are either blank, or contain 
>> dates in the format of -mm-dd. This allows ES to dynamically map this 
>> field to type dateOptionalTime.
>>
>> However, we occasionally see non-standard date formats in these fields, 
>> which our main service can handle fine, but which throws a 
>> MapperParsingException in Elasticsearch - such are here:
>>
>>
>>
>> [2015-02-06 10:46:50,679][WARN ][cluster.action.shard ] [logging-
>> production-elasticsearch-ip-xxx-xxx-xxx-148] [logstash-2015.02.06][2] 
>> received shard failed for [logstash-2015.02.06][2], node[
>> GZpltBjAQUqGyp2B1SLz_g], [R], s[INITIALIZING], indexUUID [BEdTwj-
>> QRuOZB713YAQwvA], reason [Failed to start shard, message [
>> RecoveryFailedException[[logstash-2015.02.06][2]: Recovery failed from [
>> logging-production-elasticsearch-ip-xxx-xxx-xxx-82][IALW-92RReiLffQjSL3I-
>> g][logging-production-elasticsearch-ip-xxx-xxx-xxx-82][inet[ip-xxx-xxx-
>> xxx-82.ec2.internal/xxx.xxx.xxx.82:9300]]{max_local_storage_nodes=1, 
>> aws_availability_zone=us-east-1e, aws_az=us-east-1e} into [logging-
>> production-elasticsearch-ip-xxx-xxx-xxx-148][GZpltBjAQUqGyp2B1SLz_g][
>> logging-production-elasticsearch-ip-xxx-xxx-xxx-148][inet[ip-xxx.xxx.xxx.
>> 148.ec2.internal/xxx.xxx.xxx.148:9300]]{max_local_storage_nodes=1, 
>> aws_availability_zone=us-east-1c, aws_az=us-east-1c}]; nested: 
>> RemoteTransportException[[logging-production-elasticsearch-ip-xxx-xxx-xxx
>> -82][inet[/xxx.xxx.xxx.82:9300]][internal:index/shard/recovery/
>> start_recovery]]; nested: RecoveryEngineException[[logstash-2015.02.06][2
>> ] Phase[2] Execution failed]; nested: RemoteTransportException[[logging-
>> production-elasticsearch-ip-xxx-xxx-xxx-148][inet[/xxx.xxx.xxx.148:9300
>> ]][internal:index/shard/recovery/translog_ops]]; nested: 
>> MapperParsingException[failed to parse [apiservice.logstash.@fields.
>> parameters.start_time]]; nested: MapperParsingException[failed to parse 
>> date field [Feb 5 2015 12:00 AM], tried both date format [
>> dateOptionalTime], and timestamp number with locale []]; nested: 
>> IllegalArgumentException[Invalid format: "Feb 5 2015 12:00 AM"]; ]]
>>
>> 2015-02-06 10:46:53,685][WARN ][cluster.action.shard ] [logging-
>> production-elasticsearch-ip-xxx-xxx-xxx-148] [logstash-2015.02.06][2] 
>> received shard failed for [logstash-2015.02.06][2], node[
>> GZpltBjAQUqGyp2B1SLz_g], [R], s[INITIALIZING], indexUUID [BEdTwj-
>> QRuOZB713YAQwvA], reason [master [logging-production-elasticsearch-ip-xxx
>> -xxx-xxx-148][GZpltBjAQUqGyp2B1SLz_g][logging-production-elasticsearch-ip
>> -xxx-xxx-xxx-148][inet[ip-xxx-xxx-xxx-148.ec2.internal/xxx.xxx.xxx.148:
>> 9300]]{max_local_storage_nodes=1, aws_availability_zone=us-east-1c, 
>> aws_az=us-east-1c} marked shard as initializing, but shard is marke

Re: Possible? Wildcard template for a collection of fields to solve some dynamic mapping woes

2015-02-06 Thread Paul Kavanagh

As we create a new index everyday, we're not concerned with retro-applying 
the fix to existing indices. So it seems Templates are the way to go here.

On Friday, February 6, 2015 at 2:52:49 PM UTC, John Smith wrote:
>
> A template wont help you here. I mean it's good to use and you should use 
> them. But once the schema is defined you can't change it. This is no 
> different then any database.
>
> Your best bet here is to do a bit of data cleansing/normalizing.
>
> If you know that the field is date field and sometimes the date is 
> different then you have to try to convert it proper date format before 
> inserting. Especially if you are trying to push it all into one field.
>
> Even if you use wildcards in templates like suggested above, you would 
> have to know that the date is different to have it pushed to another field.
>
> On Friday, 6 February 2015 06:41:49 UTC-5, Itamar Syn-Hershko wrote:
>>
>> You mean something like dynamic templates? 
>> http://code972.com/blog/2015/02/81-elasticsearch-one-tip-a-day-using-dynamic-templates-to-avoid-rigorous-mappings
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>> Freelance Developer & Consultant
>> Lucene.NET committer and PMC member
>>
>> On Fri, Feb 6, 2015 at 1:39 PM, Paul Kavanagh  
>> wrote:
>>
>>> Hi all,
>>> We're having a MapperParsingException problem with some field values 
>>> when we get when we use the JSON Filter for Logstash to explode out a JSON 
>>> document to Elasticsearch fields.
>>>
>>> In 99.9% of cases, certain of these fields are either blank, or contain 
>>> dates in the format of -mm-dd. This allows ES to dynamically map this 
>>> field to type dateOptionalTime.
>>>
>>> However, we occasionally see non-standard date formats in these fields, 
>>> which our main service can handle fine, but which throws a 
>>> MapperParsingException in Elasticsearch - such are here:
>>>
>>>
>>>
>>> [2015-02-06 10:46:50,679][WARN ][cluster.action.shard ] [logging-
>>> production-elasticsearch-ip-xxx-xxx-xxx-148] [logstash-2015.02.06][2] 
>>> received shard failed for [logstash-2015.02.06][2], node[
>>> GZpltBjAQUqGyp2B1SLz_g], [R], s[INITIALIZING], indexUUID [BEdTwj-
>>> QRuOZB713YAQwvA], reason [Failed to start shard, message [
>>> RecoveryFailedException[[logstash-2015.02.06][2]: Recovery failed from [
>>> logging-production-elasticsearch-ip-xxx-xxx-xxx-82][IALW-92RReiLffQjSL3I
>>> -g][logging-production-elasticsearch-ip-xxx-xxx-xxx-82][inet[ip-xxx-xxx-
>>> xxx-82.ec2.internal/xxx.xxx.xxx.82:9300]]{max_local_storage_nodes=1, 
>>> aws_availability_zone=us-east-1e, aws_az=us-east-1e} into [logging-
>>> production-elasticsearch-ip-xxx-xxx-xxx-148][GZpltBjAQUqGyp2B1SLz_g][
>>> logging-production-elasticsearch-ip-xxx-xxx-xxx-148][inet[ip-xxx.xxx.xxx
>>> .148.ec2.internal/xxx.xxx.xxx.148:9300]]{max_local_storage_nodes=1, 
>>> aws_availability_zone=us-east-1c, aws_az=us-east-1c}]; nested: 
>>> RemoteTransportException[[logging-production-elasticsearch-ip-xxx-xxx-
>>> xxx-82][inet[/xxx.xxx.xxx.82:9300]][internal:index/shard/recovery/
>>> start_recovery]]; nested: RecoveryEngineException[[logstash-2015.02.06][
>>> 2] Phase[2] Execution failed]; nested: RemoteTransportException[[logging
>>> -production-elasticsearch-ip-xxx-xxx-xxx-148][inet[/xxx.xxx.xxx.148:9300
>>> ]][internal:index/shard/recovery/translog_ops]]; nested: 
>>> MapperParsingException[failed to parse [apiservice.logstash.@fields.
>>> parameters.start_time]]; nested: MapperParsingException[failed to parse 
>>> date field [Feb 5 2015 12:00 AM], tried both date format [
>>> dateOptionalTime], and timestamp number with locale []]; nested: 
>>> IllegalArgumentException[Invalid format: "Feb 5 2015 12:00 AM"]; ]]
>>>
>>> 2015-02-06 10:46:53,685][WARN ][cluster.action.shard ] [logging-
>>> production-elasticsearch-ip-xxx-xxx-xxx-148] [logstash-2015.02.06][2] 
>>> received shard failed for [logstash-2015.02.06][2], node[
>>> GZpltBjAQUqGyp2B1SLz_g], [R], s[INITIALIZING], indexUUID [BEdTwj-
>>> QRuOZB713YAQwvA], reason [master [logging-production-elasticsearch-ip-
>>> xxx-xxx-xxx-148][GZpltBjAQUqGyp2B1SLz_g][logging-production-
>>> elasticsearch-ip-xxx-xxx-xxx-148][inet[ip-xxx-xxx-xxx-148.ec2.internal/
>>> xxx.xxx.xxx.148:9300]]{max_local_storage_nodes=1, aws_availability_zone=
>>> us-east-1c, aws_az=us-east-1c} marked

Re: Possible? Wildcard template for a collection of fields to solve some dynamic mapping woes

2015-02-06 Thread Paul Kavanagh

I think you have something there. I have come up with this:

curl -XPUT localhost:9200/_template/template_1 -d '
{
"template" : "logstash-*",
"order" : 0,
"settings" : {
"number_of_shards" : 15
},
"mappings" : {
  "dynamic_templates":[
{"apiservice_logstash":{
"match":"apiservice.logstash.@fields.parameters.*",
"match_mapping_type":"dateOptionalTime",
"mapping":{
  "type":"string",
  "analyzer":"english"
}
  }
}
  ]
}
}
'

However... When I try to post it, Elasticsearch throws:
{"error":"ElasticsearchIllegalArgumentException[Malformed mappings section 
for type [dynamic_templates], should include an inner object describing the 
mapping]","status":400}

i've tried a few things, but it doesn't seem to like my mappings block for 
some reason.

Any idea why?

On Friday, February 6, 2015 at 11:41:49 AM UTC, Itamar Syn-Hershko wrote:
>
> You mean something like dynamic templates? 
> http://code972.com/blog/2015/02/81-elasticsearch-one-tip-a-day-using-dynamic-templates-to-avoid-rigorous-mappings
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Lucene.NET committer and PMC member
>
> On Fri, Feb 6, 2015 at 1:39 PM, Paul Kavanagh  > wrote:
>
>> Hi all,
>> We're having a MapperParsingException problem with some field values when 
>> we get when we use the JSON Filter for Logstash to explode out a JSON 
>> document to Elasticsearch fields.
>>
>> In 99.9% of cases, certain of these fields are either blank, or contain 
>> dates in the format of -mm-dd. This allows ES to dynamically map this 
>> field to type dateOptionalTime.
>>
>> However, we occasionally see non-standard date formats in these fields, 
>> which our main service can handle fine, but which throws a 
>> MapperParsingException in Elasticsearch - such are here:
>>
>>
>>
>> [2015-02-06 10:46:50,679][WARN ][cluster.action.shard ] [logging-
>> production-elasticsearch-ip-xxx-xxx-xxx-148] [logstash-2015.02.06][2] 
>> received shard failed for [logstash-2015.02.06][2], node[
>> GZpltBjAQUqGyp2B1SLz_g], [R], s[INITIALIZING], indexUUID [BEdTwj-
>> QRuOZB713YAQwvA], reason [Failed to start shard, message [
>> RecoveryFailedException[[logstash-2015.02.06][2]: Recovery failed from [
>> logging-production-elasticsearch-ip-xxx-xxx-xxx-82][IALW-92RReiLffQjSL3I-
>> g][logging-production-elasticsearch-ip-xxx-xxx-xxx-82][inet[ip-xxx-xxx-
>> xxx-82.ec2.internal/xxx.xxx.xxx.82:9300]]{max_local_storage_nodes=1, 
>> aws_availability_zone=us-east-1e, aws_az=us-east-1e} into [logging-
>> production-elasticsearch-ip-xxx-xxx-xxx-148][GZpltBjAQUqGyp2B1SLz_g][
>> logging-production-elasticsearch-ip-xxx-xxx-xxx-148][inet[ip-xxx.xxx.xxx.
>> 148.ec2.internal/xxx.xxx.xxx.148:9300]]{max_local_storage_nodes=1, 
>> aws_availability_zone=us-east-1c, aws_az=us-east-1c}]; nested: 
>> RemoteTransportException[[logging-production-elasticsearch-ip-xxx-xxx-xxx
>> -82][inet[/xxx.xxx.xxx.82:9300]][internal:index/shard/recovery/
>> start_recovery]]; nested: RecoveryEngineException[[logstash-2015.02.06][2
>> ] Phase[2] Execution failed]; nested: RemoteTransportException[[logging-
>> production-elasticsearch-ip-xxx-xxx-xxx-148][inet[/xxx.xxx.xxx.148:9300
>> ]][internal:index/shard/recovery/translog_ops]]; nested: 
>> MapperParsingException[failed to parse [apiservice.logstash.@fields.
>> parameters.start_time]]; nested: MapperParsingException[failed to parse 
>> date field [Feb 5 2015 12:00 AM], tried both date format [
>> dateOptionalTime], and timestamp number with locale []]; nested: 
>> IllegalArgumentException[Invalid format: "Feb 5 2015 12:00 AM"]; ]]
>>
>> 2015-02-06 10:46:53,685][WARN ][cluster.action.shard ] [logging-
>> production-elasticsearch-ip-xxx-xxx-xxx-148] [logstash-2015.02.06][2] 
>> received shard failed for [logstash-2015.02.06][2], node[
>> GZpltBjAQUqGyp2B1SLz_g], [R], s[INITIALIZING], indexUUID [BEdTwj-
>> QRuOZB713YAQwvA], reason [master [logging-production-elasticsearch-ip-xxx
>> -xxx-xxx-148][GZpltBjAQUqGyp2B1SLz_g][logging-production-elasticsearch-ip
>> -xxx-xxx-xxx-148][inet[ip-xxx-xxx-xxx-148.ec2.internal/xxx.xxx.xxx.148:
>> 9300]]{max_local_storage_nodes=1, aws_availability_zone=us-east-1c, 
>> aws_az=us-east-1c} marked shard as initializing, but shard is marked a

Possible? Wildcard template for a collection of fields to solve some dynamic mapping woes

2015-02-06 Thread Paul Kavanagh

Hi all,
We're having a MapperParsingException problem with some field values when 
we get when we use the JSON Filter for Logstash to explode out a JSON 
document to Elasticsearch fields.

In 99.9% of cases, certain of these fields are either blank, or contain 
dates in the format of -mm-dd. This allows ES to dynamically map this 
field to type dateOptionalTime.

However, we occasionally see non-standard date formats in these fields, 
which our main service can handle fine, but which throws a 
MapperParsingException in Elasticsearch - such are here:



[2015-02-06 10:46:50,679][WARN ][cluster.action.shard ] [logging-
production-elasticsearch-ip-xxx-xxx-xxx-148] [logstash-2015.02.06][2] 
received shard failed for [logstash-2015.02.06][2], node[
GZpltBjAQUqGyp2B1SLz_g], [R], s[INITIALIZING], indexUUID [BEdTwj-
QRuOZB713YAQwvA], reason [Failed to start shard, message [
RecoveryFailedException[[logstash-2015.02.06][2]: Recovery failed from [
logging-production-elasticsearch-ip-xxx-xxx-xxx-82][IALW-92RReiLffQjSL3I-g][
logging-production-elasticsearch-ip-xxx-xxx-xxx-82][inet[ip-xxx-xxx-xxx-
82.ec2.internal/xxx.xxx.xxx.82:9300]]{max_local_storage_nodes=1, 
aws_availability_zone=us-east-1e, aws_az=us-east-1e} into [logging-
production-elasticsearch-ip-xxx-xxx-xxx-148][GZpltBjAQUqGyp2B1SLz_g][logging
-production-elasticsearch-ip-xxx-xxx-xxx-148][inet[ip-xxx.xxx.xxx.
148.ec2.internal/xxx.xxx.xxx.148:9300]]{max_local_storage_nodes=1, 
aws_availability_zone=us-east-1c, aws_az=us-east-1c}]; nested: 
RemoteTransportException[[logging-production-elasticsearch-ip-xxx-xxx-xxx-82
][inet[/xxx.xxx.xxx.82:9300]][internal:index/shard/recovery/start_recovery
]]; nested: RecoveryEngineException[[logstash-2015.02.06][2] Phase[2] 
Execution failed]; nested: RemoteTransportException[[logging-production-
elasticsearch-ip-xxx-xxx-xxx-148][inet[/xxx.xxx.xxx.148:9300]][internal:
index/shard/recovery/translog_ops]]; nested: MapperParsingException[failed 
to parse [apiservice.logstash.@fields.parameters.start_time]]; nested: 
MapperParsingException[failed to parse date field [Feb 5 2015 12:00 AM], 
tried both date format [dateOptionalTime], and timestamp number with locale 
[]]; nested: IllegalArgumentException[Invalid format: "Feb 5 2015 12:00 AM"
]; ]]

2015-02-06 10:46:53,685][WARN ][cluster.action.shard ] [logging-
production-elasticsearch-ip-xxx-xxx-xxx-148] [logstash-2015.02.06][2] 
received shard failed for [logstash-2015.02.06][2], node[
GZpltBjAQUqGyp2B1SLz_g], [R], s[INITIALIZING], indexUUID [BEdTwj-
QRuOZB713YAQwvA], reason [master [logging-production-elasticsearch-ip-xxx-
xxx-xxx-148][GZpltBjAQUqGyp2B1SLz_g][logging-production-elasticsearch-ip-xxx
-xxx-xxx-148][inet[ip-xxx-xxx-xxx-148.ec2.internal/xxx.xxx.xxx.148:9300]]{
max_local_storage_nodes=1, aws_availability_zone=us-east-1c, aws_az=us-east-
1c} marked shard as initializing, but shard is marked as failed, resend 
shard failure]


Our planned solution was to create a template for Logstash indices that 
will set these fields to string. But as the field above isn't the only 
culprit, and more may be added overtime, it makes more sense to create a 
template to map all fields under apiservice.logstash.@fields.parameters.* 
to be string. (We never need to query on user entered data, but it's great 
to have logged for debugging)

Is it possible to do this with a template? I could not find a way to do 
this via the template documentation on the ES site. 

Any guidance would be great!

Thanks,
-Paul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6ca4030f-b6bb-4907-b2fc-e3166fa2a6af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Searching by Temporal Proximity

2015-01-12 Thread Paul Harvey


I am quite new to ElasticSearch, most of what I need to seems to be easily 
supported however I have hit one problem - specifically searching by 
temporal proximity.   

What do I mean by this?  I'll explain a simplified scenario.

I have indexed events each with an associated date-time.  I need to 
discover those events that occur within a given time period *of each other*. 
  Ideally this time period would be arbitrary and specified at search time.

The "of each other" part is key here.   I cannot just, say:

   - Aggregate by week or month, as an event that occurs at the end of one 
   month is within a month of one which occurs at the start of the next month 
   but would fall into a different bucket.
   - Do a date range based search, as I do not have a fixed range to search.

To describe what I need in a different way, in SQL I'd do something like:

SELECT eventid, eventdate from Events t1 where EXISTS(

SELECT 1 FROM Events t2 WHERE 

t1.eventid <> t2.eventid AND 

t1.eventdate BETWEEN 

dateadd( day, -30, t2.eventdate ) 

AND 

dateadd( day, 30, t2.eventdate )
);


The actual scenario is more complex - each event has a type and what I need 
ultimately is to be able to answer questions of akin to:

"Find events of type X that occur with in 2 days of an event of type Y"


and even:

"Find events of type X that occur with in 2 days of an event of type Y and 
of type Z" 


Each event will be nested in/a child of, a parent record and I am only 
interested in the temporal proximity of events with the same parent.  The 
database 
has a total of 10^9 events, each parent may have the order of 10^3 
associated events. The use case is search heavy with ingests of deltas 
approximately weekly.

I can munge the data on import in any way that would help. I have had a 
couple of ideas on how to tackle the problem but neither are satisfactory. 

I wondered whether there is a standard way to tackle this kind of 
requirement in ElasticSearch and whether anyone else had run up against it.

Thanks,

Paul.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/31bc7955-2043-49dd-865d-1dbc048a6dde%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marvel Sense GET/POST

2015-01-05 Thread Paul Scott

Regarding the behaviour of Sense to automatically choose POST regardless of 
the user writing GET:

I was recently rewriting a search query I had written in the Sense UI from 
a search query to a request for a raw document by ID, using the GET API. In 
the process of changing from

GET /index/type/_search
{ ... }


to

GET /index/type/id


I submitted

GET /index/type/id
{... }


which Sense helpfully submitted as a POST request, over-writing the 
document with the search query. In production.

To say that this behaviour was unexpected and unhelpful would be an 
understatement. I would consider it irresponsible and unsafe, and I highly 
recommend the behaviour be disabled.

Please try to read past my obvious irritation and the part of this mistake 
which falls on my head, to the part in this mistake that the Sense UI 
played too.

All the best,

Paul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f2588751-a008-4b59-9041-b63c098580dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: zen disco socket usage and port

2014-12-16 Thread Paul Baclace

Now that you pointed out the addition "es." prefix needed when specifying 
on the command line, I can see that:
  
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html
shows it via color in an example and with "instead of". Apparently, that 
was not obvious enough for me!

Thanks for the transport.netty.worker_count tip.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c5813aa7-a2d8-430f-a801-bca56aecc325%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: zen disco socket usage and port

2014-12-16 Thread Paul Baclace

More info:  I tried to set the transport port like this: 
 -Dtransport.tcp.port= on the elasticsearch command line, but it still 
uses port 9300. 


On Tuesday, December 16, 2014 12:42:12 PM UTC-8, Paul Baclace wrote:
>
> Is it normal for a single node elasticsearch process to open 13 sockets to 
> itself? This seems like an excessive zen disco party and inexplicable. I am 
> trying out v1.4. 
>
> Is it possible to set the transport protocol port? 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/27ea7a90-af23-4989-8d82-c2ba5ca27f75%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

zen disco socket usage and port

2014-12-16 Thread Paul Baclace

Is it normal for a single node elasticsearch process to open 13 sockets to 
itself? This seems like an excessive zen disco party and inexplicable. I am 
trying out v1.4. 

Is it possible to set the transport protocol port? 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0db13e63-9d6d-4d60-b05b-a4c0a9bd490a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

zen disco socket usage and port

2014-12-16 Thread Paul Baclace

Is it normal for a single node elasticsearch process to open 13 sockets to 
itself? This seems like an excessive zen disco party and inexplicable. I am 
trying out v1.4. 

Is it possible to set the transport protocol port? 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0ba05311-c067-44ce-96f1-44ad71ac422d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-09-16 Thread SAURAV PAUL

Oh. Sorry :-)

On Mon, Sep 15, 2014 at 3:27 AM, Mark Walkom 
wrote:

> You probably want to put this in your own thread :)
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
> On 15 September 2014 06:55, SAURAV PAUL  wrote:
>
>> Hi,
>>
>> I am trying to use Spark and ElasticSearch.
>>
>> Currently, the RDD contains pipe delimited records.
>>
>> parsedRDD.saveAsNewAPIHadoopFile(outputLocation,
>>NullWritable.class,
>>Text.class,
>>CustomTextOutputFormat.class,
>>job.getConfiguration());
>>
>> Write now I am storing the output in HDFS. Instead now I want to create
>> an index and store the output and want to use to kibana to do some
>> analysis.
>>
>> What do I need to change so that I can push into ElasticSearch? Is it
>> ESOutputFormat?
>>
>>
>>
>> On Monday, July 7, 2014 11:14:47 PM UTC+5:30, Costin Leau wrote:
>>>
>>> Thanks for the analysis. It looks like Hadoop 1.0.4 POM has an invalid
>>> pom - though it uses Jackson 1.8.8 (see the distro) the pom declares
>>> version 1.0.1 for some reason. Hadoop version 1.2 (the latest stable) and
>>> higher has this fixed.
>>>
>>> We don't mark the jackson version within our POM since it's already
>>> available at runtime - we can probably due so going forward in the Spark
>>> integration.
>>>
>>>
>>> On Mon, Jul 7, 2014 at 6:39 PM, Brian Thomas 
>>> wrote:
>>>
>>>> Here is the gradle build I was using originally:
>>>>
>>>> apply plugin: 'java'
>>>> apply plugin: 'eclipse'
>>>>
>>>> sourceCompatibility = 1.7
>>>> version = '0.0.1'
>>>> group = 'com.spark.testing'
>>>>
>>>> repositories {
>>>> mavenCentral()
>>>> }
>>>>
>>>> dependencies {
>>>> compile 'org.apache.spark:spark-core_2.10:1.0.0'
>>>>  compile 'edu.stanford.nlp:stanford-corenlp:3.3.1'
>>>> compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version:
>>>> '3.3.1', classifier:'models'
>>>>  compile files('lib/elasticsearch-hadoop-2.0.0.jar')
>>>> testCompile 'junit:junit:4.+'
>>>> testCompile group: "com.github.tlrx", name: "elasticsearch-test",
>>>> version: "1.2.1"
>>>> }
>>>>
>>>>
>>>> When I ran dependencyInsight on jackson, I got the following output:
>>>>
>>>> C:\dev\workspace\SparkProject>gradle dependencyInsight --dependency
>>>> jackson-core
>>>>
>>>> :dependencyInsight
>>>> com.fasterxml.jackson.core:jackson-core:2.3.0
>>>> \--- com.fasterxml.jackson.core:jackson-databind:2.3.0
>>>>  +--- org.json4s:json4s-jackson_2.10:3.2.6
>>>>  |\--- org.apache.spark:spark-core_2.10:1.0.0
>>>>  | \--- compile
>>>>  \--- com.codahale.metrics:metrics-json:3.0.0
>>>>   \--- org.apache.spark:spark-core_2.10:1.0.0 (*)
>>>>
>>>> org.codehaus.jackson:jackson-core-asl:1.0.1
>>>> \--- org.codehaus.jackson:jackson-mapper-asl:1.0.1
>>>>  \--- org.apache.hadoop:hadoop-core:1.0.4
>>>>   \--- org.apache.hadoop:hadoop-client:1.0.4
>>>>\--- org.apache.spark:spark-core_2.10:1.0.0
>>>> \--- compile
>>>>
>>>> Version 1.0.1 of jackson-core-asl does not have the field
>>>> ALLOW_UNQUOTED_FIELD_NAMES, but later versions of it do.
>>>>
>>>> On Sunday, July 6, 2014 4:28:56 PM UTC-4, Costin Leau wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Glad to see you sorted out the problem. Out of curiosity what version
>>>>> of jackson were you using and what was pulling it in? Can you share you
>>>>> maven pom/gradle build?
>>>>>
>>>>>
>>>>> On Sun, Jul 6, 2014 at 10:27 PM, Brian Thomas 
>>>>> wrote:
>>>>>
>>>>>>  I figured it out, dependency issue in my classpath.  Maven was
>>>>>> pulling down a very old version of the jackson jar.  I added the 
>>>>>> following
>>>>>> line to my

Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-09-14 Thread SAURAV PAUL

Hi,

I am trying to use Spark and ElasticSearch.

Currently, the RDD contains pipe delimited records.

parsedRDD.saveAsNewAPIHadoopFile(outputLocation,
   NullWritable.class, 
   Text.class, 
   CustomTextOutputFormat.class,
   job.getConfiguration());

Write now I am storing the output in HDFS. Instead now I want to create an 
index and store the output and want to use to kibana to do some analysis. 

What do I need to change so that I can push into ElasticSearch? Is it 
ESOutputFormat?



On Monday, July 7, 2014 11:14:47 PM UTC+5:30, Costin Leau wrote:
>
> Thanks for the analysis. It looks like Hadoop 1.0.4 POM has an invalid pom 
> - though it uses Jackson 1.8.8 (see the distro) the pom declares version 
> 1.0.1 for some reason. Hadoop version 1.2 (the latest stable) and higher 
> has this fixed.
>
> We don't mark the jackson version within our POM since it's already 
> available at runtime - we can probably due so going forward in the Spark 
> integration.
>
>
> On Mon, Jul 7, 2014 at 6:39 PM, Brian Thomas  > wrote:
>
>> Here is the gradle build I was using originally:
>>
>> apply plugin: 'java'
>> apply plugin: 'eclipse'
>>
>> sourceCompatibility = 1.7
>> version = '0.0.1'
>> group = 'com.spark.testing'
>>
>> repositories {
>> mavenCentral()
>> }
>>
>> dependencies {
>> compile 'org.apache.spark:spark-core_2.10:1.0.0'
>>  compile 'edu.stanford.nlp:stanford-corenlp:3.3.1'
>> compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version: 
>> '3.3.1', classifier:'models'
>>  compile files('lib/elasticsearch-hadoop-2.0.0.jar')
>> testCompile 'junit:junit:4.+'
>> testCompile group: "com.github.tlrx", name: "elasticsearch-test", 
>> version: "1.2.1"
>> }
>>
>>
>> When I ran dependencyInsight on jackson, I got the following output:
>>
>> C:\dev\workspace\SparkProject>gradle dependencyInsight --dependency 
>> jackson-core
>>
>> :dependencyInsight
>> com.fasterxml.jackson.core:jackson-core:2.3.0
>> \--- com.fasterxml.jackson.core:jackson-databind:2.3.0
>>  +--- org.json4s:json4s-jackson_2.10:3.2.6
>>  |\--- org.apache.spark:spark-core_2.10:1.0.0
>>  | \--- compile
>>  \--- com.codahale.metrics:metrics-json:3.0.0
>>   \--- org.apache.spark:spark-core_2.10:1.0.0 (*)
>>
>> org.codehaus.jackson:jackson-core-asl:1.0.1
>> \--- org.codehaus.jackson:jackson-mapper-asl:1.0.1
>>  \--- org.apache.hadoop:hadoop-core:1.0.4
>>   \--- org.apache.hadoop:hadoop-client:1.0.4
>>\--- org.apache.spark:spark-core_2.10:1.0.0
>> \--- compile
>>
>> Version 1.0.1 of jackson-core-asl does not have the field 
>> ALLOW_UNQUOTED_FIELD_NAMES, but later versions of it do.
>>
>> On Sunday, July 6, 2014 4:28:56 PM UTC-4, Costin Leau wrote:
>>
>>> Hi,
>>>
>>> Glad to see you sorted out the problem. Out of curiosity what version of 
>>> jackson were you using and what was pulling it in? Can you share you maven 
>>> pom/gradle build?
>>>
>>>
>>> On Sun, Jul 6, 2014 at 10:27 PM, Brian Thomas  
>>> wrote:
>>>
  I figured it out, dependency issue in my classpath.  Maven was 
 pulling down a very old version of the jackson jar.  I added the following 
 line to my dependencies and the error went away:

 compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.13'


 On Friday, July 4, 2014 3:22:30 PM UTC-4, Brian Thomas wrote:
>
>  I am trying to test querying elasticsearch using Apache Spark using 
> elasticsearch-hadoop.  I am just trying to do a query to the 
> elasticsearch 
> server and return the count of results.
>
> Below is my test class using the Java API:
>
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.io.MapWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.spark.SparkConf;
> import org.apache.spark.api.java.JavaPairRDD;
> import org.apache.spark.api.java.JavaSparkContext;
> import org.apache.spark.serializer.KryoSerializer;
> import org.elasticsearch.hadoop.mr.EsInputFormat;
>
> import scala.Tuple2;
>
> public class ElasticsearchSparkQuery{
>
> public static int query(String masterUrl, String 
> elasticsearchHostPort) {
> SparkConf sparkConfig = new SparkConf().setAppName("ESQuer
> y").setMaster(masterUrl);
> sparkConfig.set("spark.serializer", 
> KryoSerializer.class.getName());
> JavaSparkContext sparkContext = new 
> JavaSparkContext(sparkConfig);
>
> Configuration conf = new Configuration();
> conf.setBoolean("mapred.map.tasks.speculative.execution", 
> false);
> conf.setBoolean("mapred.reduce.tasks.speculative.execution", 
> false);
> conf.set("es.nodes", elasticsearchHostPort);
> conf.set("es.resource", "media/docs");
> conf.set("es.query", "?q=*");
>
> JavaPairRDD esRDD = 
> sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class,

Re: Replica Shard inconsistencies & disabling compression don't appear to help

2014-08-21 Thread Paul Smith

No haven't looked at trying to manually compare the shard segments. Would
have to manually snapshot the data (constantly under change)

And no I wasn't sure our issue is related to the compression bug but it
sure sounded like it.

What we see is a small volume of changes relating to Deletes that do not
properly delete on the replica. The records are being deleted from the
primary but sometimes the replica does not seem to get this change.

Interesting that the other report by another ES user how's the 1.3.1
release does fix the issue but the setting change doesn't. Since the new
release is limited to mostly this library change seems restricted to this
area.

Only real way is for us to upgrade, which is sadly not straight forward
right now.

Paul
On Thursday, 21 August 2014, joergpra...@gmail.com 
wrote:

> Do you observe the replica shard inconsistency only by checksum after
> network transport?
>
> In other words, are you sure the inconsistency you observe is caused by a
> compression issue in LZF?
>
> Jörg
>
>
> On Thu, Aug 21, 2014 at 5:52 AM, Paul Smith  > wrote:
>
>> Hi all,
>>
>> The recent ES 1.3.1 announcement around the ning/compression library bug
>> had me excited because we have had a lingering replica shard inconsistency
>> issue for a long time (on a very old ES version, which we desperately
>> desire to upgrade, but have reasons why we can't just yet).
>>
>> Anyway, we've tried the Disabling Compression on the recovery setting
>> trick and soaked the change for a few days but continue to see replica
>> issues, and wanted to report this back.
>>
>> We regularly 'clean' these by shunting replica shards around to have them
>> rebuilt, so are confident they were clean for a day, and our check tool
>> finds small numbers still coming through each day.
>>
>> I heard from another ES member privately around this same change, and the
>> 1.3.1 does fix the issue for him, but the recovery setting trick doesn't
>> help either.
>>
>> I'm not sure if a full cluster start/stop is required for the recovery
>> channel to switch to non-compressed, reading the release notes it seemed to
>> read like a runtime setting that could be changed immediately, but it's not
>> helping here for these types of issues.
>>
>> I can't yet report if ES 1.3.1 does fix the underlying issue though
>> because of said upgrade issues (all our side, not an ES problem).
>>
>> anyway, thought I would pass that data point on.
>>
>> regards,
>>
>> Paul Smith
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com
>> 
>> .
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAHfYWB6FpH-i7kDs9m_v4HgwPK-DWK75mATO_7DZNQjpDh6B%3Dw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CAHfYWB6FpH-i7kDs9m_v4HgwPK-DWK75mATO_7DZNQjpDh6B%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com
> 
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEMDArRwjXw7Qv6jfD4Sa%3DiXGu-Aez6240kg3wS%3DyGT_Q%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEMDArRwjXw7Qv6jfD4Sa%3DiXGu-Aez6240kg3wS%3DyGT_Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB7jQQGHn8oTN0EK80hjDQ1PjAWRDy6JWZJ_rSdBGVsvrQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Replica Shard inconsistencies & disabling compression don't appear to help

2014-08-20 Thread Paul Smith

Hi all,

The recent ES 1.3.1 announcement around the ning/compression library bug
had me excited because we have had a lingering replica shard inconsistency
issue for a long time (on a very old ES version, which we desperately
desire to upgrade, but have reasons why we can't just yet).

Anyway, we've tried the Disabling Compression on the recovery setting trick
and soaked the change for a few days but continue to see replica issues,
and wanted to report this back.

We regularly 'clean' these by shunting replica shards around to have them
rebuilt, so are confident they were clean for a day, and our check tool
finds small numbers still coming through each day.

I heard from another ES member privately around this same change, and the
1.3.1 does fix the issue for him, but the recovery setting trick doesn't
help either.

I'm not sure if a full cluster start/stop is required for the recovery
channel to switch to non-compressed, reading the release notes it seemed to
read like a runtime setting that could be changed immediately, but it's not
helping here for these types of issues.

I can't yet report if ES 1.3.1 does fix the underlying issue though because
of said upgrade issues (all our side, not an ES problem).

anyway, thought I would pass that data point on.

regards,

Paul Smith

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB6FpH-i7kDs9m_v4HgwPK-DWK75mATO_7DZNQjpDh6B%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shard data inconsistencies

2014-08-13 Thread Paul Smith

Hi Aaron, late to this party for sure, sorry.  I feel your pain, this is
happening for us, and I've seen reports of it occurring across versions,
but with very little information to go on I don't think progress has been
made.  I actually don't think there's an issue raised for it.  Perhaps that
should be a first step.

We call this problem a "Flappy Item" because the item appears, disappears
in search results depending on whether the search hits the primary or
replica shard.  Flaps back and forth.

The only way to repair the problem is to rebuild the replica shard.  You
can disable _all_ replicas and then re-enable them, and the primary shard
will be used as the source and it will work.  That's if you can live with
the lack of redundancy for that length of time

Alternatively we have found that issuing a Move command to relocate the
replica shard off the current host and on to another, also causes ES to
generate a new replica shard using the primary as the source, and that
corrects the problem.

A caveat we've found with this approach at least with the old version of ES
we're sadly still using (0.19... hmm) that after the move, the cluster will
likely want to rebalance, and the shard allocation after rebalance can from
time to time put the replica back where it was.  ES on that original node
then goes "Oh look, here's the same shard I had earlier, lets use that"..
 Which means you're back to square one..  You can force _all_ replica
shards to move by coming up with a move command that shuffles them around,
and that definitely does work, but obviously takes longer for large
clusters.

In terms of tooling around this, I offer you these:

Scrutineer - https://github.com/Aconex/scrutineer-  Can detect differences
between your source of truth (db?) and your index (ES).  This does pickup
the case where the replica is reporting an item that should have been
deleted.

Flappy Item Detector - https://github.com/Aconex/es-flappyitem-detector -
given a set of suspect IDs can check the primary vs replica to confirm/deny
it being one of these cases.  There is also support to issue basic move
commands with some simple logic to attempt to rebuild that replica.

Hope that helps.

cheers,

Paul Smith

On 8 August 2014 01:14, aaron  wrote:

> I've noticed on a few of my clusters that some shard replicas will be
> perpetually inconsistent w/ other shards. Even when all of my writes are
> successful and use write_consistency = ALL and replication = SYNC.
>
> A GET by id will return 404/missing for one replica but return the
> document for the other two replicas. Even after refresh, the shard is never
> "repaired".
>
> Using ES 0.90.7.
>
> Is this a known defect? Is there a means to detect, prevent, or at least
> detect & repair when this occurs?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/164c3362-1ed4-4e90-8bb6-283543a20cf9%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/164c3362-1ed4-4e90-8bb6-283543a20cf9%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB7XhaB-ZkJqE8%3DfNu%2BZdNzGC%2BPx%3Dv61OGTzQABbHNZfSg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Dealing with spam in this forum

2014-07-01 Thread Paul Brown

Hi, Clinton --

May I suggest:

- Some users (e.g., me) who read this list via an email subscription regard
ANY spam on the list as an unacceptable state of affairs.  This is not a
problem with Apache lists, for example, so I would point the finger of
blame at Google Groups.

- Having N longstanding members who are willing to help ban spammers is
equivalent to having N longstanding members who are willing to quickly
admit new users.  (And you're welcome to add me as N+1.)

- Banning is ineffective.  Spammers will continuously sign up with new
accounts.

-- Paul


—
p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


On Tue, Jul 1, 2014 at 11:36 AM, Clinton Gormley <
clinton.gorm...@elasticsearch.com> wrote:

> Hi all
>
> Recently we've had a few spam emails that have made it through Google's
> filters, and there have been a calls for us to change to a
> moderate-first-post policy. I am reluctant to adopt this policy for the
> following reasons:
>
> We get about 30 new users every day from all over the world, many of whom
> are early in their learning phase and are quite stuck - they need help as
> soon as possible. Fortunately this list is very active and helpful. In
> contrast, we've only ever banned 34 users from the list for spamming.  So
> making new users wait for timezones to swing their way feels like a heavy
> handed solution to a small problem. Yes, spammers are annoying but they are
> a small minority on this list.
>
> Instead, we have asked 10 of our long standing members to help us with
> banning spammers.  This way we have Spam Guardians active around the globe,
> who only need to do something if a spammer raises their ugly head above the
> parapet. One or two spam emails may get through, but hopefully somebody
> will leap into action and stop their activity before it becomes too
> tiresome.
>
> This isn't an exclusive list. If you would like to be on it, feel free to
> email me.  Note: I expect you to be a long standing and currently active
> member of this list to be included.
>
> If this solution doesn't solve the problem, then we can reconsider
> moderate-first-post, but we've managed to go 5 years without requiring it,
> and I'd prefer to keep things as easy as possible for new users.
>
> Clint
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/c9af5a09-0295-42e3-bc20-52471828aa96%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/c9af5a09-0295-42e3-bc20-52471828aa96%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACArsZ_92BQQSdLgYjKU2PsoQO5%2BFyFafyrFZTyeY720xGgMww%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Very frequent ES OOM's & potential segment merge problems

2014-06-20 Thread Paul Sabou

java.lang.IllegalStateException: this writer hit an OutOfMemoryError; 
cannot complete merge
at 
org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:3546)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4272)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3728)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at 
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:106)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)


On Thursday, June 19, 2014 10:35:28 PM UTC+2, Paul Sabou wrote:
>
> Hi,
>
> *Situation:*
> We are using ES 1.2.1 on a machine with 32GB RAM, fast SSD and 12 cores. The 
> machine runs Ubuntu 14.0.x LTS.
> The ES process has 12GB of RAM allocated.
>
> We have an index in which we inserted 105 million small documents so the 
> ES data folder is around 50GB in size
> (we see this by using du -h . on the folder)
>
> The new document insertion rate is rather small (ie. 100-300 small docs 
> per second).
>
> *The problem:*
>
> We experienced rather frequent ES OOM (Out of Memory) at a rate of around 
> one every 15 mins. To lower the load on the index
> we deleted 104+ million docs (ie. mostly small log entries) by deleting 
> everything in one type :
> curl -XDELETE http://localhost:9200/index_xx/type_yy
>
> so that we ended up with an ES index with several thousands docs. 
> After this we started to experience massive disk IO (10-20Mbs reads and 
> 1MBs writes) and more frequent OOM's (at a rate of around
> one every 7 minutes). We restart ES after every OOM and kept monitoring 
> the data folder size. Over the next hour the size went down
> to around 36GB but now it's stuck there (doesn't go down in size even 
> after several hours).
>
> *Questions* : 
> Is this a problem related to segment merging running out of memory? If so 
> how can be solved? 
> If not, what could be the problem? 
>
>
> Thanks
> Paul.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/db4e6c34-2d6b-4623-aa9c-c6fbf9083ea9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Very frequent ES OOM's & potential segment merge problems

2014-06-19 Thread Paul Sabou

Hi,

*Situation:*
We are using ES 1.2.1 on a machine with 32GB RAM, fast SSD and 12 cores. The
machine runs Ubuntu 14.0.x LTS.
The ES process has 12GB of RAM allocated.

We have an index in which we inserted 105 million small documents so the ES
data folder is around 50GB in size
(we see this by using du -h . on the folder)

The new document insertion rate is rather small (ie. 100-300 small docs per
second).

*The problem:*

We experienced rather frequent ES OOM (Out of Memory) at a rate of around
one every 15 mins. To lower the load on the index
we deleted 104+ million docs (ie. mostly small log entries) by deleting
everything in one type :
curl -XDELETE http://localhost:9200/index_xx/type_yy

so that we ended up with an ES index with several thousands docs.
After this we started to experience massive disk IO (10-20Mbs reads and
1MBs writes) and more frequent OOM's (at a rate of around
one every 7 minutes). We restart ES after every OOM and kept monitoring the
data folder size. Over the next hour the size went down
to around 36GB but now it's stuck there (doesn't go down in size even after
several hours).

*Questions* :
Is this a problem related to segment merging running out of memory? If so
how can be solved?
If not, what could be the problem?

Thanks
Paul.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/695c92a3-f77a-46bd-9041-79421a0bf1be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Need help setting up autocomplete suggestions using phrase suggester

2014-06-06 Thread Paul Bormans

I just wanted to let anyone know i didn't succeed with the phrase suggester 
and i switched to the completion suggester, that actually works very well 
to my purpose.
Paul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e0410cb8-60df-428c-8225-6ea00146630f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES OutOfMemory on a 30GB index

2014-05-29 Thread Paul Sanwald

We've narrowed the problem down to a multi_match clause in our query:
 {"multi_match":{"fields":["attachments.*.bodies"], "query":"foobar"}}

This has to do with the way we've structured our index, We are searching an 
index that contains emails, and we are indexing attachments in the 
attachments.*.bodies fields. For example, attachments.1.bodies would 
contain the text body of an attachment.

This structure is clearly sub-optimal in terms of multi_match queries, but 
I need to structure our index in some way that we can search the contents 
of an email and the parsed contents of its attachments, and get back the 
email as a result.

>From reading the docs, it seems like the better way to solve this is with 
nested types?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html

--paul

On Wednesday, May 28, 2014 7:11:05 PM UTC-4, Paul Sanwald wrote:
>
> Sorry, it's Java 7:
>
> jvm: {
> pid: 20424
> version: 1.7.0_09-icedtea
> vm_name: OpenJDK 64-Bit Server VM
> vm_version: 23.7-b01
> vm_vendor: Oracle Corporation
> start_time: 1401309063644
> mem: {
> heap_init_in_bytes: 1073741824
> heap_max_in_bytes: 10498867200
> non_heap_init_in_bytes: 24313856
> non_heap_max_in_bytes: 318767104
> direct_max_in_bytes: 10498867200
> }
> gc_collectors: [
> PS Scavenge
> PS MarkSweep
> ]
> memory_pools: [
> Code Cache
> PS Eden Space
> PS Survivor Space
> PS Old Gen
> PS Perm Gen
> ]
>
> On Wednesday, May 28, 2014 6:58:26 PM UTC-4, Mark Walkom wrote:
>>
>> What java version are you running, it's not in the stats gist.
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>  
>>
>> On 29 May 2014 08:33, Paul Sanwald  wrote:
>>
>>> I apologize about the signature, it's automatic. I've created a gist 
>>> with the cluster node stats:
>>> https://gist.github.com/pcsanwald/e11ba02ac591757c8d92
>>>
>>> We are using 1.1.0, using aggregations a lot but nothing crazy. We run 
>>> our app on much much larger indices successfully. But, the problem seems to 
>>> be present itself on even basic search cases. The one thing that's 
>>> different about this dataset is a lot of it is in spanish.
>>>
>>> thanks for your help!
>>>
>>> On Wednesday, May 28, 2014 6:22:59 PM UTC-4, Mark Walkom wrote:
>>>>
>>>> Can you provide some specs on your cluster, OS, RAM, heap, disk, java 
>>>> and ES versions?
>>>> Are you using parent/child relationships, TTLs, large facet or other 
>>>> queries?
>>>>
>>>>
>>>> (Also, your elaborate legalese signature is kind of moot given you're 
>>>> posting to a public mailing list :p)
>>>>
>>>> Regards,
>>>> Mark Walkom
>>>>
>>>> Infrastructure Engineer
>>>> Campaign Monitor
>>>> email: ma...@campaignmonitor.com
>>>> web: www.campaignmonitor.com
>>>>  
>>>>
>>>> On 29 May 2014 07:27, Paul Sanwald  wrote:
>>>>
>>>>> Hi Everyone,
>>>>>We are seeing continual OOM exceptions on one of our 1.1.0 
>>>>> elasticsearch clusters, the index is ~30GB, quite small. I'm trying to 
>>>>> work 
>>>>> out the root cause via heap dump analysis, but not having a lot of luck. 
>>>>> I 
>>>>> don't want to include a bunch of unnecessary info, but the stacktrace 
>>>>> we're 
>>>>> seeing is pasted below. Has anyone seen this before? I've been using the 
>>>>> cluster stats and node stats APIs to try and find a smoking gun, but I'm 
>>>>> not seeing anything that looks out of the ordinary.
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> 14/05/27 20:37:08 WARN transport.netty: [Strongarm] Failed to send 
>>>>> error message back to client for action [search/phase/query]
>>>>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>> 14/05/27 20:37:08 WARN transport.netty: [Strongarm] Actual Exception
>>>>> org.elasticsearch.search.query.QueryPhaseExecutionException: 
>>>>> [eventdata][2]: q
>>>>> uery[ConstantScore(*:*)],from[0],size[0]: Query Failed [Failed to 
>>>>> execute main
>>>>>  query]
>>>>>

Re: ES OutOfMemory on a 30GB index

2014-05-28 Thread Paul Sanwald

Sorry, it's Java 7:

jvm: {
pid: 20424
version: 1.7.0_09-icedtea
vm_name: OpenJDK 64-Bit Server VM
vm_version: 23.7-b01
vm_vendor: Oracle Corporation
start_time: 1401309063644
mem: {
heap_init_in_bytes: 1073741824
heap_max_in_bytes: 10498867200
non_heap_init_in_bytes: 24313856
non_heap_max_in_bytes: 318767104
direct_max_in_bytes: 10498867200
}
gc_collectors: [
PS Scavenge
PS MarkSweep
]
memory_pools: [
Code Cache
PS Eden Space
PS Survivor Space
PS Old Gen
PS Perm Gen
]

On Wednesday, May 28, 2014 6:58:26 PM UTC-4, Mark Walkom wrote:
>
> What java version are you running, it's not in the stats gist.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 29 May 2014 08:33, Paul Sanwald 
> > wrote:
>
>> I apologize about the signature, it's automatic. I've created a gist with 
>> the cluster node stats:
>> https://gist.github.com/pcsanwald/e11ba02ac591757c8d92
>>
>> We are using 1.1.0, using aggregations a lot but nothing crazy. We run 
>> our app on much much larger indices successfully. But, the problem seems to 
>> be present itself on even basic search cases. The one thing that's 
>> different about this dataset is a lot of it is in spanish.
>>
>> thanks for your help!
>>
>> On Wednesday, May 28, 2014 6:22:59 PM UTC-4, Mark Walkom wrote:
>>>
>>> Can you provide some specs on your cluster, OS, RAM, heap, disk, java 
>>> and ES versions?
>>> Are you using parent/child relationships, TTLs, large facet or other 
>>> queries?
>>>
>>>
>>> (Also, your elaborate legalese signature is kind of moot given you're 
>>> posting to a public mailing list :p)
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>  
>>>
>>> On 29 May 2014 07:27, Paul Sanwald  wrote:
>>>
>>>> Hi Everyone,
>>>>We are seeing continual OOM exceptions on one of our 1.1.0 
>>>> elasticsearch clusters, the index is ~30GB, quite small. I'm trying to 
>>>> work 
>>>> out the root cause via heap dump analysis, but not having a lot of luck. I 
>>>> don't want to include a bunch of unnecessary info, but the stacktrace 
>>>> we're 
>>>> seeing is pasted below. Has anyone seen this before? I've been using the 
>>>> cluster stats and node stats APIs to try and find a smoking gun, but I'm 
>>>> not seeing anything that looks out of the ordinary.
>>>>
>>>> Any ideas?
>>>>
>>>> 14/05/27 20:37:08 WARN transport.netty: [Strongarm] Failed to send 
>>>> error message back to client for action [search/phase/query]
>>>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>> 14/05/27 20:37:08 WARN transport.netty: [Strongarm] Actual Exception
>>>> org.elasticsearch.search.query.QueryPhaseExecutionException: 
>>>> [eventdata][2]: q
>>>> uery[ConstantScore(*:*)],from[0],size[0]: Query Failed [Failed to 
>>>> execute main
>>>>  query]
>>>> at org.elasticsearch.search.query.QueryPhase.execute(
>>>> QueryPhase.java:1
>>>> 27)
>>>> at org.elasticsearch.search.SearchService.executeQueryPhase(
>>>> SearchService.java:257)
>>>> at org.elasticsearch.search.action.
>>>> SearchServiceTransportAction$SearchQueryTransportHandler.
>>>> messageReceived(SearchServiceTransportAction.java:623)
>>>> at org.elasticsearch.search.action.
>>>> SearchServiceTransportAction$SearchQueryTransportHandler.
>>>> messageReceived(SearchServiceTransportAction.java:612)
>>>> at org.elasticsearch.transport.netty.MessageChannelHandler$
>>>> RequestHandler.run(MessageChannelHandler.java:270)
>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.java:1145)
>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:722)
>>>> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>  
>>>> *Important Notice:*  The information contained in or attached to this 
>>>> email message is confidential and proprietary information of RedOwl 
>>&g

Re: ES OutOfMemory on a 30GB index

2014-05-28 Thread Paul Sanwald

I apologize about the signature, it's automatic. I've created a gist with 
the cluster node stats:
https://gist.github.com/pcsanwald/e11ba02ac591757c8d92

We are using 1.1.0, using aggregations a lot but nothing crazy. We run our 
app on much much larger indices successfully. But, the problem seems to be 
present itself on even basic search cases. The one thing that's different 
about this dataset is a lot of it is in spanish.

thanks for your help!

On Wednesday, May 28, 2014 6:22:59 PM UTC-4, Mark Walkom wrote:
>
> Can you provide some specs on your cluster, OS, RAM, heap, disk, java and 
> ES versions?
> Are you using parent/child relationships, TTLs, large facet or other 
> queries?
>
>
> (Also, your elaborate legalese signature is kind of moot given you're 
> posting to a public mailing list :p)
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 29 May 2014 07:27, Paul Sanwald 
> > wrote:
>
>> Hi Everyone,
>>We are seeing continual OOM exceptions on one of our 1.1.0 
>> elasticsearch clusters, the index is ~30GB, quite small. I'm trying to work 
>> out the root cause via heap dump analysis, but not having a lot of luck. I 
>> don't want to include a bunch of unnecessary info, but the stacktrace we're 
>> seeing is pasted below. Has anyone seen this before? I've been using the 
>> cluster stats and node stats APIs to try and find a smoking gun, but I'm 
>> not seeing anything that looks out of the ordinary.
>>
>> Any ideas?
>>
>> 14/05/27 20:37:08 WARN transport.netty: [Strongarm] Failed to send error 
>> message back to client for action [search/phase/query]
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>> 14/05/27 20:37:08 WARN transport.netty: [Strongarm] Actual Exception
>> org.elasticsearch.search.query.QueryPhaseExecutionException: 
>> [eventdata][2]: q
>> uery[ConstantScore(*:*)],from[0],size[0]: Query Failed [Failed to execute 
>> main
>>  query]
>> at 
>> org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:1
>> 27)
>> at 
>> org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:257)
>> at 
>> org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
>> at 
>> org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:612)
>> at 
>> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:722)
>> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
>>  
>> *Important Notice:*  The information contained in or attached to this 
>> email message is confidential and proprietary information of RedOwl 
>> Analytics, Inc., and by opening this email or any attachment the recipient 
>> agrees to keep such information strictly confidential and not to use or 
>> disclose the information other than as expressly authorized by RedOwl 
>> Analytics, Inc.  If you are not the intended recipient, please be aware 
>> that any use, printing, copying, disclosure, dissemination, or the taking 
>> of any act in reliance on this communication or the information contained 
>> herein is strictly prohibited. If you think that you have received this 
>> email message in error, please delete it and notify the sender. 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/e6634ad4-619f-4f24-8287-d3bc97722a88%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/e6634ad4-619f-4f24-8287-d3bc97722a88%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
-- 
*Important Notice:*  The information contained in or attached to this email 
message is confidential and proprietary information of RedOwl Analytics, 
Inc., and by op

ES OutOfMemory on a 30GB index

2014-05-28 Thread Paul Sanwald

Hi Everyone,
   We are seeing continual OOM exceptions on one of our 1.1.0 elasticsearch 
clusters, the index is ~30GB, quite small. I'm trying to work out the root 
cause via heap dump analysis, but not having a lot of luck. I don't want to 
include a bunch of unnecessary info, but the stacktrace we're seeing is 
pasted below. Has anyone seen this before? I've been using the cluster 
stats and node stats APIs to try and find a smoking gun, but I'm not seeing 
anything that looks out of the ordinary.

Any ideas?

14/05/27 20:37:08 WARN transport.netty: [Strongarm] Failed to send error 
message back to client for action [search/phase/query]
java.lang.OutOfMemoryError: GC overhead limit exceeded
14/05/27 20:37:08 WARN transport.netty: [Strongarm] Actual Exception
org.elasticsearch.search.query.QueryPhaseExecutionException: 
[eventdata][2]: q
uery[ConstantScore(*:*)],from[0],size[0]: Query Failed [Failed to execute 
main
 query]
at 
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:1
27)
at 
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:257)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:612)
at 
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

-- 
*Important Notice:*  The information contained in or attached to this email 
message is confidential and proprietary information of RedOwl Analytics, 
Inc., and by opening this email or any attachment the recipient agrees to 
keep such information strictly confidential and not to use or disclose the 
information other than as expressly authorized by RedOwl Analytics, Inc. 
 If you are not the intended recipient, please be aware that any use, 
printing, copying, disclosure, dissemination, or the taking of any act in 
reliance on this communication or the information contained herein is 
strictly prohibited. If you think that you have received this email message 
in error, please delete it and notify the sender.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e6634ad4-619f-4f24-8287-d3bc97722a88%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Need help setting up autocomplete suggestions using phrase suggester

2014-05-21 Thread Paul Bormans

After days of reading ... i'm still confused and need some help setting up 
an autocomplete function.

My intention is to provide search suggestions (as-you-type) to a twitter 
typeahead search box. So this means i don't want to return actual documents 
but ranked suggestion instead. The three suggesters seem to target exactly 
that so i thought to try the phrase suggester.

{
  "accoms_unittest" : {
"settings" : {
  "index" : {
"uuid" : "t6hW60RoSmyS7sf8_GBrwg",
"analysis" : {
  "filter" : {
"filter_shingle" : {
  "type" : "shingle",
  "min_shingle_size" : "1",
  "max_shingle_size" : "4",
  "output_unigrams" : "true"
}
  },
  "analyzer" : {
"shingle_analyzer" : {
  "type" : "custom",
  "filter" : [ "standard", "lowercase", "filter_shingle" ],
  "tokenizer" : "standard"
}
  }
},
"number_of_replicas" : "1",
"number_of_shards" : "5",
"version" : {
  "created" : "1000299"
}
  }
}
  }
}


{
  "accoms_unittest" : {
"mappings" : {
  "modelresult" : {
"_boost" : {
  "name" : "boost",
  "null_value" : 1.0
},
"properties" : {

<...>
  "text_nl" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "snowball"
  },
  "text_suggest_nl" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "shingle_analyzer"
  },
}
  }
}
  }
}


GET accoms_unittest/_search
{
  "query": {
"match_all": {}
  },
  "suggest": {
"text": "levendige kleu",
"simple_phrase" : {
  "phrase": {
"field": "text_suggest_nl",
"size": 5,
"analyzer": "standard",
"real_word_error_likelihood": 0.95,
"max_errors": 1,
"gram_size": 4,
"direct_generator" : [ {
  "field" : "text_nl",
  "suggest_mode" : "always",
  "min_word_len" : 1
} ]
  }
  
  
}
  },
  "from": 0,
  "size": 0
}

{
   "took": 8,
   "timed_out": false,
   "_shards": {
  "total": 5,
  "successful": 5,
  "failed": 0
   },
   "hits": {
  "total": 12,
  "max_score": 0,
  "hits": []
   },
   "suggest": {
  "simple_phrase": [
 {
"text": "levendige kleu",
"offset": 0,
"length": 14,
"options": []
 }
  ]
   }
}


I know one of the objects has following input text: "De levendige kleuren en de 
etc...", so i run following search experiments:

"text": "levendige kleure" yields a suggestion "levendige kleuren" with score: 
2854.164

"text": "levendig kleure" yields no results?

"text": "levendi" also yields no suggestions?


Obviously when a user starts typing i would like to see a suggestion like 
"levendige" based on input search text "lev".


What am i missing here?


- is this the way to go?

- 


Paul Bormans


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2ed471f3-bf44-4b05-911e-f5540f9f5fb2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: java 8, elasticsearch, and MVEL

2014-05-16 Thread Paul Sanwald

It's a little hard to tell between the mvel es commit histories and the 
github issue.

It looks like this isn't fixed, and isn't going to get fixed in MVEL? Am I 
misreading something?

--paul

On Monday, April 21, 2014 8:39:43 AM UTC-4, Alexander Reelsen wrote:
>
> Hey,
>
> this commits upgrades mvel, that seems to have fixed the java8 issues 
> (still requires more testing on our side though): 
> https://github.com/elasticsearch/elasticsearch/commit/21a36678883c159e50a03b76309d3da2a8e5d7b4
>
> IIRC this bug has also been fixed in the new MVEL version: 
> https://github.com/elasticsearch/elasticsearch/issues/5483
>
>
> --Alex
>
>
> On Tue, Apr 15, 2014 at 11:40 AM, Bernhard Berger 
> 
> > wrote:
>
>>  Is there an open issue so that I can watch the progress for this bug? I 
>> cannot find any issue for this on GitHub.
>>
>> Am 07.04.2014 01:12, schrieb Shay Banon:
>>  
>> We will report back with findings and progress.
>>
>>   
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/534CFEA8.9060100%40gmail.com<https://groups.google.com/d/msgid/elasticsearch/534CFEA8.9060100%40gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
-- 
*Important Notice:*  The information contained in or attached to this email 
message is confidential and proprietary information of RedOwl Analytics, 
Inc., and by opening this email or any attachment the recipient agrees to 
keep such information strictly confidential and not to use or disclose the 
information other than as expressly authorized by RedOwl Analytics, Inc. 
 If you are not the intended recipient, please be aware that any use, 
printing, copying, disclosure, dissemination, or the taking of any act in 
reliance on this communication or the information contained herein is 
strictly prohibited. If you think that you have received this email message 
in error, please delete it and notify the sender.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e4717341-fc95-4ca0-badf-50b38e6df5d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shard Initialization slow down

2014-05-13 Thread Paul

Thanks Jörg, we've heard of others pre-creating indices, we were seeing it 
as a work around rather than a regular action but what you say makes it 
seem like something we should work with.


On Tuesday, May 13, 2014 12:13:10 PM UTC+1, Jörg Prante wrote:
>
> You should create indexes before bulk indexing. First, bulk indexing works 
> much better if all indices and their mappings are already present, the 
> operations will run faster and without conflicts, and the cluster state 
> updates are less frequent which reduces some noise and hiccups. Second, 
> setting the indices refresh rate to -1 and replica level to 0 while in bulk 
> indexing mode helps a lot for performance.
>
> If you create 1000+ shards per node, you seem to exceed the limit of your 
> system. Do not expect admin operations like index creation work in O(1) 
> time, they are O(n/c)  with n = number of affected shards and c the 
> threadpool size for the operation (the total node number also counts but I 
> neglect it here). So yes, it is expected that index creation operations 
> take longer if they reach the limit of your nodes, but there can be plenty 
> of reasons for it (increasing shard count is just one of them). And it is 
> expected that you see the 30s cluster action timeout in theses cases, yes.
>
> There is no strictly predictable resource limit for a node, all this 
> depends heavily on factors from outside of Elasticsearch (JVM, CPU, memory, 
> disk I/O, your workload of indexing/searching) so it is up to you to 
> calibrate your node capacity. After adding nodes, you will observe that ES 
> scales well and can handle more shards.
>
> Jörg
>
>
> On Tue, May 13, 2014 at 11:59 AM, Paul >wrote:
>
>> We are seeing a slow down in shard initialization speed as the number of 
>> shards/indices grows in our cluster.
>>
>> With 0-100's of indices/shards existing in the cluster a new bulk 
>> creation of indices up the 100's at a time is fine, we see them pass 
>> through the states and get a green cluster in a reasonable amount of time.
>>
>> As the total cluster size grows to 1000+ indices (3000+ shards) we begin 
>> to notice that the first rounds of initialization take longer to process, 
>> it seems to speed up after the first few batches, but this slow down leads 
>> to "failed to process cluster event (create-index [index_1112], cause 
>> [auto(bulk api)]) within 30s" type messages in the Master logs - the 
>> indices are eventually created.
>>
>>
>> Has anyone else experienced this? (did you find the cause / way to fix?)
>>
>> Is this somewhat expected behaviour? - are we approaching something 
>> incorrectly? (there are 3 data nodes involved, with 3 shards per index)
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6c918772-cd05-4640-aa67-3924737b3342%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shard Initialization slow down

2014-05-13 Thread Paul

This looks very interesting, thanks.


On Tuesday, May 13, 2014 11:38:27 AM UTC+1, Mark Harwood wrote:
>
> This API should give an indication on any backlog in processing the 
> cluster state: 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-pending.html
>
>
>
> On Tuesday, May 13, 2014 11:29:20 AM UTC+1, Paul wrote:
>>
>> Ok, do you know if there are clear indicators when limits are being 
>> reached?
>>
>> We don't see errors in the logs (apart from the 30s timeout) but if there 
>> are system or ES provided metrics that we can track to know when we need to 
>> scale it would be really useful.
>>
>>
>> Thanks,
>>
>> Paul.  
>>
>>
>>
>> On Tuesday, May 13, 2014 11:24:06 AM UTC+1, Mark Walkom wrote:
>>>
>>> Empty or not, there is still metadata that ES needs to maintain in the 
>>> cluster state. So the more indexes you have open the bigger that is and the 
>>> more resources required to track it.
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 13 May 2014 20:16, Paul  wrote:
>>>
>>>> In testing and replicating the issue, this slow down has been seen 
>>>> occurring with empty indices. 
>>>>
>>>> The running cluster is at present ~100 GB across 2,200 Indices with a 
>>>> total of 13,500 shards and ~430,000,000 documents.
>>>>
>>>> We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly 
>>>> carefully but don't think the heap is maxing out on any of the nodes when 
>>>> this occurs.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Paul.
>>>>
>>>>
>>>> On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:
>>>>>
>>>>> Sounds like the inevitable "add more nodes" situation.
>>>>>
>>>>> How much RAM on each node, how big is your data set?
>>>>>
>>>>> Regards,
>>>>> Mark Walkom
>>>>>
>>>>> Infrastructure Engineer
>>>>> Campaign Monitor
>>>>> email: ma...@campaignmonitor.com
>>>>> web: www.campaignmonitor.com
>>>>>  
>>>>>
>>>>> On 13 May 2014 19:59, Paul  wrote:
>>>>>
>>>>>> We are seeing a slow down in shard initialization speed as the number 
>>>>>> of shards/indices grows in our cluster.
>>>>>>
>>>>>> With 0-100's of indices/shards existing in the cluster a new bulk 
>>>>>> creation of indices up the 100's at a time is fine, we see them pass 
>>>>>> through the states and get a green cluster in a reasonable amount of 
>>>>>> time.
>>>>>>
>>>>>> As the total cluster size grows to 1000+ indices (3000+ shards) we 
>>>>>> begin to notice that the first rounds of initialization take longer to 
>>>>>> process, it seems to speed up after the first few batches, but this slow 
>>>>>> down leads to "failed to process cluster event (create-index 
>>>>>> [index_1112], cause [auto(bulk api)]) within 30s" type messages in the 
>>>>>> Master logs - the indices are eventually created.
>>>>>>
>>>>>>
>>>>>> Has anyone else experienced this? (did you find the cause / way to 
>>>>>> fix?)
>>>>>>
>>>>>> Is this somewhat expected behaviour? - are we approaching something 
>>>>>> incorrectly? (there are 3 data nodes involved, with 3 shards per index)
>>>>>>  
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%
>>>>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dbdfe1ea-7b1b-4e65-bcdb-251aceab1fe0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shard Initialization slow down

2014-05-13 Thread Paul

Thanks Mark, we'll have a look at the available metrics.
 


On Tuesday, May 13, 2014 11:34:51 AM UTC+1, Mark Walkom wrote:
>
> You will want to obtain Marvel (
> http://www.elasticsearch.org/guide/en/marvel/current/) and then wait till 
> you have a history and start digging.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 13 May 2014 20:29, Paul > wrote:
>
>> Ok, do you know if there are clear indicators when limits are being 
>> reached?
>>
>> We don't see errors in the logs (apart from the 30s timeout) but if there 
>> are system or ES provided metrics that we can track to know when we need to 
>> scale it would be really useful.
>>
>>
>> Thanks,
>>
>> Paul.  
>>
>>
>>
>> On Tuesday, May 13, 2014 11:24:06 AM UTC+1, Mark Walkom wrote:
>>>
>>> Empty or not, there is still metadata that ES needs to maintain in the 
>>> cluster state. So the more indexes you have open the bigger that is and the 
>>> more resources required to track it.
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 13 May 2014 20:16, Paul  wrote:
>>>
>>>> In testing and replicating the issue, this slow down has been seen 
>>>> occurring with empty indices. 
>>>>
>>>> The running cluster is at present ~100 GB across 2,200 Indices with a 
>>>> total of 13,500 shards and ~430,000,000 documents.
>>>>
>>>> We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly 
>>>> carefully but don't think the heap is maxing out on any of the nodes when 
>>>> this occurs.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Paul.
>>>>
>>>>
>>>> On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:
>>>>>
>>>>> Sounds like the inevitable "add more nodes" situation.
>>>>>
>>>>> How much RAM on each node, how big is your data set?
>>>>>
>>>>> Regards,
>>>>> Mark Walkom
>>>>>
>>>>> Infrastructure Engineer
>>>>> Campaign Monitor
>>>>> email: ma...@campaignmonitor.com
>>>>> web: www.campaignmonitor.com
>>>>>  
>>>>>
>>>>> On 13 May 2014 19:59, Paul  wrote:
>>>>>
>>>>>> We are seeing a slow down in shard initialization speed as the number 
>>>>>> of shards/indices grows in our cluster.
>>>>>>
>>>>>> With 0-100's of indices/shards existing in the cluster a new bulk 
>>>>>> creation of indices up the 100's at a time is fine, we see them pass 
>>>>>> through the states and get a green cluster in a reasonable amount of 
>>>>>> time.
>>>>>>
>>>>>> As the total cluster size grows to 1000+ indices (3000+ shards) we 
>>>>>> begin to notice that the first rounds of initialization take longer to 
>>>>>> process, it seems to speed up after the first few batches, but this slow 
>>>>>> down leads to "failed to process cluster event (create-index 
>>>>>> [index_1112], cause [auto(bulk api)]) within 30s" type messages in the 
>>>>>> Master logs - the indices are eventually created.
>>>>>>
>>>>>>
>>>>>> Has anyone else experienced this? (did you find the cause / way to 
>>>>>> fix?)
>>>>>>
>>>>>> Is this somewhat expected behaviour? - are we approaching something 
>>>>>> incorrectly? (there are 3 data nodes involved, with 3 shards per index)
>>>>>>  
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40goo
>>>>>>

Re: Shard Initialization slow down

2014-05-13 Thread Paul

Ok, do you know if there are clear indicators when limits are being reached?

We don't see errors in the logs (apart from the 30s timeout) but if there 
are system or ES provided metrics that we can track to know when we need to 
scale it would be really useful.


Thanks,

Paul.  



On Tuesday, May 13, 2014 11:24:06 AM UTC+1, Mark Walkom wrote:
>
> Empty or not, there is still metadata that ES needs to maintain in the 
> cluster state. So the more indexes you have open the bigger that is and the 
> more resources required to track it.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 13 May 2014 20:16, Paul > wrote:
>
>> In testing and replicating the issue, this slow down has been seen 
>> occurring with empty indices. 
>>
>> The running cluster is at present ~100 GB across 2,200 Indices with a 
>> total of 13,500 shards and ~430,000,000 documents.
>>
>> We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly 
>> carefully but don't think the heap is maxing out on any of the nodes when 
>> this occurs.
>>
>>
>> Thanks,
>>
>> Paul.
>>
>>
>> On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:
>>>
>>> Sounds like the inevitable "add more nodes" situation.
>>>
>>> How much RAM on each node, how big is your data set?
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>  
>>>
>>> On 13 May 2014 19:59, Paul  wrote:
>>>
>>>> We are seeing a slow down in shard initialization speed as the number 
>>>> of shards/indices grows in our cluster.
>>>>
>>>> With 0-100's of indices/shards existing in the cluster a new bulk 
>>>> creation of indices up the 100's at a time is fine, we see them pass 
>>>> through the states and get a green cluster in a reasonable amount of time.
>>>>
>>>> As the total cluster size grows to 1000+ indices (3000+ shards) we 
>>>> begin to notice that the first rounds of initialization take longer to 
>>>> process, it seems to speed up after the first few batches, but this slow 
>>>> down leads to "failed to process cluster event (create-index 
>>>> [index_1112], cause [auto(bulk api)]) within 30s" type messages in the 
>>>> Master logs - the indices are eventually created.
>>>>
>>>>
>>>> Has anyone else experienced this? (did you find the cause / way to fix?)
>>>>
>>>> Is this somewhat expected behaviour? - are we approaching something 
>>>> incorrectly? (there are 3 data nodes involved, with 3 shards per index)
>>>>  
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%
>>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9e52d337-7b5d-411b-904d-477c0806f99d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shard Initialization slow down

2014-05-13 Thread Paul

In testing and replicating the issue, this slow down has been seen 
occurring with empty indices. 

The running cluster is at present ~100 GB across 2,200 Indices with a total 
of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly 
carefully but don't think the heap is maxing out on any of the nodes when 
this occurs.


Thanks,

Paul.


On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:
>
> Sounds like the inevitable "add more nodes" situation.
>
> How much RAM on each node, how big is your data set?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 13 May 2014 19:59, Paul > wrote:
>
>> We are seeing a slow down in shard initialization speed as the number of 
>> shards/indices grows in our cluster.
>>
>> With 0-100's of indices/shards existing in the cluster a new bulk 
>> creation of indices up the 100's at a time is fine, we see them pass 
>> through the states and get a green cluster in a reasonable amount of time.
>>
>> As the total cluster size grows to 1000+ indices (3000+ shards) we begin 
>> to notice that the first rounds of initialization take longer to process, 
>> it seems to speed up after the first few batches, but this slow down leads 
>> to "failed to process cluster event (create-index [index_1112], cause 
>> [auto(bulk api)]) within 30s" type messages in the Master logs - the 
>> indices are eventually created.
>>
>>
>> Has anyone else experienced this? (did you find the cause / way to fix?)
>>
>> Is this somewhat expected behaviour? - are we approaching something 
>> incorrectly? (there are 3 data nodes involved, with 3 shards per index)
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shard Initialization slow down

2014-05-13 Thread Paul

We are seeing a slow down in shard initialization speed as the number of 
shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk creation 
of indices up the 100's at a time is fine, we see them pass through the 
states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we begin to 
notice that the first rounds of initialization take longer to process, it 
seems to speed up after the first few batches, but this slow down leads to 
"failed 
to process cluster event (create-index [index_1112], cause [auto(bulk 
api)]) within 30s" type messages in the Master logs - the indices are 
eventually created.


Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something 
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Testing for an Empty String With the Following

2014-04-18 Thread Paul

Hi,

Thanks for everyone's patience while I learn the elasticsearch query DSL. 
 I'm trying to get used to its verbosity.


How would I do a query like this, again in SQL parlance:  select col1 from 
mysource where col2 = "" and col3 in ["", "one", "two"] and col4 = "foo"

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cbf00b67-b354-4087-a937-450055fce661%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Testing for an Empty String

2014-04-18 Thread Paul

Hi,

Thanks for everyone's patience while I learn the elasticsearch query DSL. 
 I'm trying to get used to its verbosity.


How would I do a query like this, again in SQL parlance:  select col1 from 
mysource where col2 = ""?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6b614d6f-cb0f-4bad-9a64-f787bd0deb29%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Is ElasticSearch the Right Tool for This

2014-04-17 Thread Paul

Hi,

We're looking to move our infrastructure to ElasticSearch and I have some 
concerns.  We plan on using this more as a database and less than a search 
engine.  I know there are some companies out there that are doing this, but 
I have some queries that, with one SQL command I can get the results I 
need, whereas ElasticSearch I would need to do filters of queries, etc.


An example, using SQL parlance, how would I do the following statement:

select col1, col2 from mytable where col3 in ["", "some", "value] and col4 
in ["another", "set", "", "values"] and col5 = "hello" and col6 not in 
"world" order by col7.

This is an example of some data I would be querying, and I would be 
performing 1000's of queries at a time.



So my question:  Can ElasticSearch do this and if so, how can I do the 
above query.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a59fcffe-5671-4ee0-a6bf-d49aedd3189b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES cluster unable to assing new shards

2014-04-17 Thread Andreas Paul

Today the ES cluster still works as expected.

Still don't know the reason why it failed in the first place or what I did 
to fix it.
Maybe a slow cluster restart helped: stopping all nodes and then starting 
only one node so it can become master instead of restarting all at once and 
letting them work it out who should become master.

Maybe I had a split brain problem by restarting too quickly, but then why 
would I see all nodes in the cluster information with only one master.

Anyway it's working now...

On Wednesday, April 16, 2014 10:26:15 AM UTC+2, Andreas Paul wrote:
>
> Yesterday I set the replica count to 0 with
>
> curl -XPUT $(hostname -f):9200/_settings -d'{'index': { 
> 'number_of_replicas':0}}'
>
> and today the ES cluster assigned the new shards as it should.
>
> I have now set the replica count back to 1 and will see if that's the 
> problem tomorrow.
>
>
> On Tuesday, April 15, 2014 5:43:32 PM UTC+2, Andreas Paul wrote:
>>
>> Hi Mark,
>>
>> I forgot to write it again in this mail, but in the gist I pasted the 
>> full logs when the ES cluster created the new indices until I tried to 
>> restart the current active master.
>>
>> # head es_cluster.log
>> [2014-04-14 02:00:01,504][INFO ][cluster.metadata ] [es@log01] 
>> [logstash-2014.04.14] creating index, cause [auto(bulk api)], shards 
>> [2]/[1], mappings [_default_]
>> [2014-04-14 02:00:02,938][INFO ][cluster.metadata ] [es@log01] 
>> [puppetmaster-2014.04.14] creating index, cause [auto(bulk api)], shards 
>> [2]/[1], mappings []
>> [2014-04-14 10:46:12,318][INFO ][node ] [es@log01] 
>> stopping ...
>> [2014-04-14 10:46:12,446][WARN ][netty.channel.DefaultChannelPipeline] An 
>> exception was thrown by an exception handler.
>> java.util.concurrent.RejectedExecutionException: Worker has already been 
>> shutdown
>> at 
>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
>> at 
>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
>>
>>
>>
>>
>> Which means that there are no log messages at all on why the cluster 
>> couldn't assign the new shards.
>>
>>
>>
>> On Tuesday, April 15, 2014 5:07:12 PM UTC+2, Mark Walkom wrote:
>>>
>>> Check your ES logs, there may be something there.
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 15 April 2014 22:20, Andreas Paul  wrote:
>>>
>>>> Hello there,
>>>>
>>>> on Monday morning our ES cluster cluster switched to red, because he 
>>>> didn't assign the new created indices to any ES node, see attached picture.
>>>>
>>>>
>>>>
>>>> I tried manually allocating these unassigned shards to a node, but it 
>>>> only returned the following error:
>>>>
>>>> # curl -XPOST $(hostname -f):9200/_cluster/reroute?pretty=true -d 
>>>> '{"commands": [{"allocate": {"index": "foobar", "shard": 0, "node": 
>>>> "es@log09", "allow_primary": true }}]}' 
>>>>
>>>> {
>>>>   "error" : 
>>>> "RemoteTransportException[[es@log05][inet[/12313.20.36.1337:9300]][cluster/reroute]];
>>>>  nested: IllegalFormatConversionException[d != java.lang.Double]; ",
>>>>   "status" : 400
>>>>
>>>> }
>>>>
>>>>
>>>> Also see https://gist.github.com/xorpaul/10644099
>>>>
>>>> I also tried
>>>>
>>>> curl -XPUT $(hostname -f):9200/_settings -d ' 
>>>> {"index.routing.allocation.disable_allocation": false}'
>>>>
>>>> and
>>>>
>>>> curl -XPUT $(hostname -f):9200/_settings -d ' 
>>>> {"index.routing.allocation.enable": "all"}'
>>>>
>>>> and removing one node from the cluster, which seemed to help, because 
>>>> it finally assigned the shards to a node.
>>>>
>>>> Unfortunately the same problem appeared again the next day, when the 
>>>> cluster tried to create new indices.
>>>>
>&g

Re: ES cluster unable to assing new shards

2014-04-16 Thread Andreas Paul

Yesterday I set the replica count to 0 with

curl -XPUT $(hostname -f):9200/_settings -d'{'index': { 
'number_of_replicas':0}}'

and today the ES cluster assigned the new shards as it should.

I have now set the replica count back to 1 and will see if that's the 
problem tomorrow.


On Tuesday, April 15, 2014 5:43:32 PM UTC+2, Andreas Paul wrote:
>
> Hi Mark,
>
> I forgot to write it again in this mail, but in the gist I pasted the full 
> logs when the ES cluster created the new indices until I tried to restart 
> the current active master.
>
> # head es_cluster.log
> [2014-04-14 02:00:01,504][INFO ][cluster.metadata ] [es@log01] 
> [logstash-2014.04.14] creating index, cause [auto(bulk api)], shards [2]/[1], 
> mappings [_default_]
> [2014-04-14 02:00:02,938][INFO ][cluster.metadata ] [es@log01] 
> [puppetmaster-2014.04.14] creating index, cause [auto(bulk api)], shards 
> [2]/[1], mappings []
> [2014-04-14 10:46:12,318][INFO ][node ] [es@log01] 
> stopping ...
> [2014-04-14 10:46:12,446][WARN ][netty.channel.DefaultChannelPipeline] An 
> exception was thrown by an exception handler.
> java.util.concurrent.RejectedExecutionException: Worker has already been 
> shutdown
> at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
>
>
>
>
> Which means that there are no log messages at all on why the cluster 
> couldn't assign the new shards.
>
>
>
> On Tuesday, April 15, 2014 5:07:12 PM UTC+2, Mark Walkom wrote:
>>
>> Check your ES logs, there may be something there.
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 15 April 2014 22:20, Andreas Paul  wrote:
>>
>>> Hello there,
>>>
>>> on Monday morning our ES cluster cluster switched to red, because he 
>>> didn't assign the new created indices to any ES node, see attached picture.
>>>
>>>
>>>
>>> I tried manually allocating these unassigned shards to a node, but it 
>>> only returned the following error:
>>>
>>> # curl -XPOST $(hostname -f):9200/_cluster/reroute?pretty=true -d 
>>> '{"commands": [{"allocate": {"index": "foobar", "shard": 0, "node": 
>>> "es@log09", "allow_primary": true }}]}' 
>>>
>>> {
>>>   "error" : 
>>> "RemoteTransportException[[es@log05][inet[/12313.20.36.1337:9300]][cluster/reroute]];
>>>  nested: IllegalFormatConversionException[d != java.lang.Double]; ",
>>>   "status" : 400
>>>
>>> }
>>>
>>>
>>> Also see https://gist.github.com/xorpaul/10644099
>>>
>>> I also tried
>>>
>>> curl -XPUT $(hostname -f):9200/_settings -d ' 
>>> {"index.routing.allocation.disable_allocation": false}'
>>>
>>> and
>>>
>>> curl -XPUT $(hostname -f):9200/_settings -d ' 
>>> {"index.routing.allocation.enable": "all"}'
>>>
>>> and removing one node from the cluster, which seemed to help, because it 
>>> finally assigned the shards to a node.
>>>
>>> Unfortunately the same problem appeared again the next day, when the 
>>> cluster tried to create new indices.
>>>
>>> Elasticsearch 1.1.0 with OpenJDK Java7 on Debian Wheezy
>>>
>>>
>>> I would like to find out the reason why the cluster doesn't assign these 
>>> new shards to any node or find a way to issue a command to the cluster to 
>>> reassign/redistribute all unassigned shards to a node.
>>>
>>> Thanks in advance!
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/c7b24957-7c97-4f2f-b1b1-70b61cb29669%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/c7b24957-7c97-4f2f-b1b1-70b61cb29669%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9c825b3c-5556-46fd-9ed2-4c86df44f7ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES cluster unable to assing new shards

2014-04-15 Thread Andreas Paul

Hi Mark,

I forgot to write it again in this mail, but in the gist I pasted the full 
logs when the ES cluster created the new indices until I tried to restart 
the current active master.

# head es_cluster.log
[2014-04-14 02:00:01,504][INFO ][cluster.metadata ] [es@log01] 
[logstash-2014.04.14] creating index, cause [auto(bulk api)], shards [2]/[1], 
mappings [_default_]
[2014-04-14 02:00:02,938][INFO ][cluster.metadata ] [es@log01] 
[puppetmaster-2014.04.14] creating index, cause [auto(bulk api)], shards 
[2]/[1], mappings []
[2014-04-14 10:46:12,318][INFO ][node ] [es@log01] stopping 
...
[2014-04-14 10:46:12,446][WARN ][netty.channel.DefaultChannelPipeline] An 
exception was thrown by an exception handler.
java.util.concurrent.RejectedExecutionException: Worker has already been 
shutdown
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)




Which means that there are no log messages at all on why the cluster 
couldn't assign the new shards.



On Tuesday, April 15, 2014 5:07:12 PM UTC+2, Mark Walkom wrote:
>
> Check your ES logs, there may be something there.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 15 April 2014 22:20, Andreas Paul >wrote:
>
>> Hello there,
>>
>> on Monday morning our ES cluster cluster switched to red, because he 
>> didn't assign the new created indices to any ES node, see attached picture.
>>
>>
>>
>> I tried manually allocating these unassigned shards to a node, but it 
>> only returned the following error:
>>
>> # curl -XPOST $(hostname -f):9200/_cluster/reroute?pretty=true -d 
>> '{"commands": [{"allocate": {"index": "foobar", "shard": 0, "node": 
>> "es@log09", "allow_primary": true }}]}' 
>>
>> {
>>   "error" : 
>> "RemoteTransportException[[es@log05][inet[/12313.20.36.1337:9300]][cluster/reroute]];
>>  nested: IllegalFormatConversionException[d != java.lang.Double]; ",
>>   "status" : 400
>>
>> }
>>
>>
>> Also see https://gist.github.com/xorpaul/10644099
>>
>> I also tried
>>
>> curl -XPUT $(hostname -f):9200/_settings -d ' 
>> {"index.routing.allocation.disable_allocation": false}'
>>
>> and
>>
>> curl -XPUT $(hostname -f):9200/_settings -d ' 
>> {"index.routing.allocation.enable": "all"}'
>>
>> and removing one node from the cluster, which seemed to help, because it 
>> finally assigned the shards to a node.
>>
>> Unfortunately the same problem appeared again the next day, when the 
>> cluster tried to create new indices.
>>
>> Elasticsearch 1.1.0 with OpenJDK Java7 on Debian Wheezy
>>
>>
>> I would like to find out the reason why the cluster doesn't assign these 
>> new shards to any node or find a way to issue a command to the cluster to 
>> reassign/redistribute all unassigned shards to a node.
>>
>> Thanks in advance!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/c7b24957-7c97-4f2f-b1b1-70b61cb29669%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/c7b24957-7c97-4f2f-b1b1-70b61cb29669%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2cb5fb25-8687-47a0-bb98-40b7a01074c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ES cluster unable to assing new shards

2014-04-15 Thread Andreas Paul

Hello there,

on Monday morning our ES cluster cluster switched to red, because he didn't 
assign the new created indices to any ES node, see attached picture.



I tried manually allocating these unassigned shards to a node, but it only 
returned the following error:

# curl -XPOST $(hostname -f):9200/_cluster/reroute?pretty=true -d '{"commands": 
[{"allocate": {"index": "foobar", "shard": 0, "node": "es@log09", 
"allow_primary": true }}]}' 
{
  "error" : 
"RemoteTransportException[[es@log05][inet[/12313.20.36.1337:9300]][cluster/reroute]];
 nested: IllegalFormatConversionException[d != java.lang.Double]; ",
  "status" : 400
}


Also see https://gist.github.com/xorpaul/10644099

I also tried

curl -XPUT $(hostname -f):9200/_settings -d ' 
{"index.routing.allocation.disable_allocation": false}'

and

curl -XPUT $(hostname -f):9200/_settings -d ' 
{"index.routing.allocation.enable": "all"}'

and removing one node from the cluster, which seemed to help, because it 
finally assigned the shards to a node.

Unfortunately the same problem appeared again the next day, when the 
cluster tried to create new indices.

Elasticsearch 1.1.0 with OpenJDK Java7 on Debian Wheezy


I would like to find out the reason why the cluster doesn't assign these 
new shards to any node or find a way to issue a command to the cluster to 
reassign/redistribute all unassigned shards to a node.

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c7b24957-7c97-4f2f-b1b1-70b61cb29669%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<>

Re: java 8, elasticsearch, and MVEL

2014-04-07 Thread Paul Sanwald

Thanks, Shay. If there's anything I can do to help with the effort, please 
do let me know.

On Sunday, April 6, 2014 7:12:39 PM UTC-4, kimchy wrote:
>
> We are planning to address this on Elasticsearch itself. The tricky bit is 
> the fact that we want to have a highly optimized concurrent scripting 
> engine. You can install the Rhino one which should work for now, its pretty 
> fast, and it allows for the type of execution we are after.
>
> We will report back with findings and progress.
>
> On Apr 6, 2014, at 14:29, joerg...@gmail.com  wrote:
>
> No, you are not the only one. MVEL breaks under Java 8 here. I use Java 8 
> with ES without scripting right now. For doc boosting, I will need 
> scripting desperately.
>
> I also want to migrate away from MVEL. My favorite is Nashorn because it 
> is part of Java 8 JDK, but I'm wrestling with thread safety issues - and my 
> tests show low performance to my surprise. 
>
> So I have tried to implement some other script languages as a plugin with 
> focus on JSR 223 (dynjs, jav8, luaj) but I'm stuck in the middle of getting 
> them to run and sorting out what script language implementation give best 
> performance and smartest resource usage behavior under ES.
>
> Jörg
>
>
> On Fri, Apr 4, 2014 at 9:11 PM, Paul Sanwald 
> 
> > wrote:
>
>> it seems I'm the only one with this problem. perhaps I will migrate our 
>> scripts to javascript. I'll post back to the group with results.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/d92ffdc0-63b5-440f-86b4-fe055b709858%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/d92ffdc0-63b5-440f-86b4-fe055b709858%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG2S2Oufs1Dm26-nT4QuT17H2zdZY2JWRkFSUpd%2Butomw%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG2S2Oufs1Dm26-nT4QuT17H2zdZY2JWRkFSUpd%2Butomw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54edc661-2525-4ea8-b9e1-f83e733401e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: java 8, elasticsearch, and MVEL

2014-04-04 Thread Paul Sanwald

it seems I'm the only one with this problem. perhaps I will migrate our 
scripts to javascript. I'll post back to the group with results.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d92ffdc0-63b5-440f-86b4-fe055b709858%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Removing unused fields (more Lucene than ES but..)

2014-04-03 Thread Paul Smith

yeah, I probably should have thought when I read that that if it was easy,
it probably would have already been done! :)

Paul


On 4 April 2014 11:07, Robert Muir  wrote:

> Thank you Paul, I added some comments just so the technical challenges
> and risks are clear.
>
> Its unfortunately not so easy to fix...
>
> On Thu, Apr 3, 2014 at 7:49 PM, Paul Smith  wrote:
> > Thanks for the JIRA link Robert, I've added a comment to it just to share
> > the real world aspect of what happened to us for background.
> >
> >
> > On 1 April 2014 18:29, Robert Muir 
> wrote:
> >>
> >> On Tue, Apr 1, 2014 at 2:41 AM, Paul Smith 
> wrote:
> >> >
> >> > Thanks Robert for the reply, all of that sounds fairly hairy.  I did
> try
> >> > a
> >> > full optimize of the shard index using Luke, but the residual
> >> > über-segment
> >> > still has the  filed definitions in it.  Are saying in (1) that the
> >> > creating
> >> > of a new Shard index through a custom call to
> IndexWriter.addIndexes(..)
> >> > would produce a _fully_ optimized index without the fields, and that
> is
> >> > different than what an Optimize operation through ES would call? More
> a
> >> > technical question now on what the differences is between the Optimize
> >> > call
> >> > and a manual create-new-index-from-multiple-readers.  (I actually
> though
> >> > that's what the Optimize does in practical terms, but there's
> obviously
> >> > more
> >> > or less going on under the hood under these different code paths).
> >> >
> >> > We're going the reindex route for now, was just hoping there was some
> >> > special trick we could do a little easier than the above. :)
> >> >
> >>
> >> Optimize and normal merging don't "garbage collect" unused fields from
> >> fieldinfos:
> >>
> >> https://issues.apache.org/jira/browse/LUCENE-1761
> >>
> >> The addindexes trick is also a forced merge, but it decorates the
> >> readers-to-be-merged: lying
> >> and hiding the fields as if they don't exist.
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "elasticsearch" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an
> >> email to elasticsearch+unsubscr...@googlegroups.com.
> >> To view this discussion on the web visit
> >>
> https://groups.google.com/d/msgid/elasticsearch/CAMUKNZW2-FEjA6CChSR3%2Br0GQYAfJ9ZOOhyU565V79QMTrPFWQ%40mail.gmail.com
> .
> >>
> >> For more options, visit https://groups.google.com/d/optout.
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "elasticsearch" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to elasticsearch+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/elasticsearch/CAHfYWB5dHJ70cxzZuZ21gdeQwN1ckZ40Yu4K%2BmJYexK-i01AVQ%40mail.gmail.com
> .
> >
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAMUKNZW6bH_M5qcf0qnRUKxvm%3DVjMdLO7RAxCbeWsKMN%3DjDrqA%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB4wQ%3D5u9%2BWU6kiA4LRiDrHNak5_qsFpVs5E5ziNDH1v%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Removing unused fields (more Lucene than ES but..)

2014-04-03 Thread Paul Smith

Thanks for the JIRA link Robert, I've added a comment to it just to share
the real world aspect of what happened to us for background.


On 1 April 2014 18:29, Robert Muir  wrote:

> On Tue, Apr 1, 2014 at 2:41 AM, Paul Smith  wrote:
> >
> > Thanks Robert for the reply, all of that sounds fairly hairy.  I did try
> a
> > full optimize of the shard index using Luke, but the residual
> über-segment
> > still has the  filed definitions in it.  Are saying in (1) that the
> creating
> > of a new Shard index through a custom call to IndexWriter.addIndexes(..)
> > would produce a _fully_ optimized index without the fields, and that is
> > different than what an Optimize operation through ES would call? More a
> > technical question now on what the differences is between the Optimize
> call
> > and a manual create-new-index-from-multiple-readers.  (I actually though
> > that's what the Optimize does in practical terms, but there's obviously
> more
> > or less going on under the hood under these different code paths).
> >
> > We're going the reindex route for now, was just hoping there was some
> > special trick we could do a little easier than the above. :)
> >
>
> Optimize and normal merging don't "garbage collect" unused fields from
> fieldinfos:
>
> https://issues.apache.org/jira/browse/LUCENE-1761
>
> The addindexes trick is also a forced merge, but it decorates the
> readers-to-be-merged: lying
> and hiding the fields as if they don't exist.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAMUKNZW2-FEjA6CChSR3%2Br0GQYAfJ9ZOOhyU565V79QMTrPFWQ%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB5dHJ70cxzZuZ21gdeQwN1ckZ40Yu4K%2BmJYexK-i01AVQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Removing unused fields (more Lucene than ES but..)

2014-03-31 Thread Paul Smith

On 1 April 2014 15:23, Robert Muir  wrote:

> It is actually possible in lucene 4, but there is nothing really
> convenient setup to do this.
>
> You have two choices there:
> 1. trigger a massive merge (essentially an optimize), by wrapping all
> readers and calling IndexWriter.addIndexes(Reader...).
> 2. wrap readers in a custom merge policy and do it slowly over time.
>
> in both cases you'd use something like
>
> http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/index/FieldFilterAtomicReader.java
>
> for lucene 3, this would be more complicated, I don't think its
> impossible but there is no available code unfortunately in this case.
>
>
Thanks Robert for the reply, all of that sounds fairly hairy.  I did try a
full optimize of the shard index using Luke, but the residual über-segment
still has the  filed definitions in it.  Are saying in (1) that the
creating of a new Shard index through a custom call to
IndexWriter.addIndexes(..) would produce a _fully_ optimized index without
the fields, and that is different than what an Optimize operation through
ES would call? More a technical question now on what the differences is
between the Optimize call and a manual
create-new-index-from-multiple-readers.  (I actually though that's what the
Optimize does in practical terms, but there's obviously more or less going
on under the hood under these different code paths).

We're going the reindex route for now, was just hoping there was some
special trick we could do a little easier than the above. :)

thanks again for your time!

Paul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB7rg6R8-B4BeJxe%2BbCJvMdJPXwVev0Udd%3DB91kc6E6uGQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

java 8, elasticsearch, and MVEL

2014-03-28 Thread Paul Sanwald

I've been testing ES with java 8, and everything is working fantastic, with 
the exception of MVEL, which is fairly broken. I've looked on the MVEL 
mailing lists, and on github issues, and there's not a lot of activity. I'm 
trying to decide if I should just migrate my MVEL scripts to a different 
language, which seems like the easiest path. Any thoughts? Have others 
moved ES installs to java 8 successfully?

--paul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0f5a4a0-3a22-42c4-a4ee-be4b9d7b9734%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Confusing highlight result when creating many tokens

2014-03-27 Thread Jon-Paul Lussier

I can confirm this issue is reproducible in 1.0.1 release

On Friday, March 14, 2014 5:29:10 PM UTC-4, Jon-Paul Lussier wrote:
>
> Hey Elasticsearch, hopefully someone can at least explain if this is 
> intentional and how it happens(I have had other fragment highlighting 
> issues not unlike this)
>
> The problem seems simple, I have a 64 character string that I generate 62 
> tokens for. Whenever I search for the entire string, I end up getting the 
> highlight applied to the 50th fragment instead of the one that actually 
> most nearly matches my search query.
>
> Also confusing is if I try a very similar search, trying to use an exact 
> match on the SHA1 or MD5 attributes -- highlighting works like I'd expect 
> it to.
>
>
> Please see the gist here: 
> https://gist.github.com/jonpaul/d4a9aa7f9c8741933cf5
>
>
> Currently I'm using 1.0.0-BETA2 so this *may* be a fixed bug, sorry if 
> that's the case, I couldn't find anything that matches my problem per se.
>
> Thanks very much in advance for help anyone can provide!
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a21e8609-3fea-4f1f-9fec-8104d45ad5a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Confusing highlight result when creating many tokens

2014-03-20 Thread Jon-Paul Lussier

Hi Elasticsearch, still waiting to see if this is a known issue, possibly 
that's resolved in a future release, or if this is something I did? I'd 
appreciate knowing, at least, if anyone can help. Thanks much.

On Friday, March 14, 2014 5:29:10 PM UTC-4, Jon-Paul Lussier wrote:
>
> Hey Elasticsearch, hopefully someone can at least explain if this is 
> intentional and how it happens(I have had other fragment highlighting 
> issues not unlike this)
>
> The problem seems simple, I have a 64 character string that I generate 62 
> tokens for. Whenever I search for the entire string, I end up getting the 
> highlight applied to the 50th fragment instead of the one that actually 
> most nearly matches my search query.
>
> Also confusing is if I try a very similar search, trying to use an exact 
> match on the SHA1 or MD5 attributes -- highlighting works like I'd expect 
> it to.
>
>
> Please see the gist here: 
> https://gist.github.com/jonpaul/d4a9aa7f9c8741933cf5
>
>
> Currently I'm using 1.0.0-BETA2 so this *may* be a fixed bug, sorry if 
> that's the case, I couldn't find anything that matches my problem per se.
>
> Thanks very much in advance for help anyone can provide!
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e2a9657d-e5df-4e0c-b1dc-78b13457827c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Confusing highlight result when creating many tokens

2014-03-14 Thread Jon-Paul Lussier

Hey Elasticsearch, hopefully someone can at least explain if this is 
intentional and how it happens(I have had other fragment highlighting 
issues not unlike this)

The problem seems simple, I have a 64 character string that I generate 62 
tokens for. Whenever I search for the entire string, I end up getting the 
highlight applied to the 50th fragment instead of the one that actually 
most nearly matches my search query.

Also confusing is if I try a very similar search, trying to use an exact 
match on the SHA1 or MD5 attributes -- highlighting works like I'd expect 
it to.


Please see the gist 
here: https://gist.github.com/jonpaul/d4a9aa7f9c8741933cf5


Currently I'm using 1.0.0-BETA2 so this *may* be a fixed bug, sorry if 
that's the case, I couldn't find anything that matches my problem per se.

Thanks very much in advance for help anyone can provide!


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6ed73d7d-fef8-4052-92a1-df2779795519%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: rebuilding replicas

2014-02-05 Thread Paul Smith

You can issue a Move allocation command of just the replicas. A single
move request with each of your replica shards shunted to a different node
should do it. The move is implemented as a copy from the primary (well it
is in ES 0.19 anyway)

Note though that if you move things aroun such that a post move
auto rebalance dictates a shard replica to go back to where it was it
appears to then find the old copy and just reuse that data which cancels
out what you just tried to so. :)

Paul

On Thursday, 6 February 2014, asanderson 
wrote:

> Is there a way via API to rebuild a replica other than just restarting the
> replica's node? If not, then I will submit a feature request.
>
> Somehow some of our replicas have different doc totals than their
> primaries, and I don't see a way to fix it via API.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to 
> elasticsearch+unsubscr...@googlegroups.com
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/9c64d53c-9fe0-412a-8056-b2d17c3ea59f%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB4%3DM%2BFyP1GXKNc_E%3Db4Z08vcOO21p9zcm539KV8L1GnUw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Primary vs. replica shard inconsistencies?

2014-01-30 Thread Paul Smith

if you do use it, don't forget we build for ES 0.19, so change the pom.xml
to your ES version otherwise it won't connect... :)


On 31 January 2014 12:29, Paul Smith  wrote:

> if it helps at all, i've pushed the flappy item detector tool (*cough*)
> here:
>
> https://github.com/Aconex/es-flappyitem-detector
>
> We have a simple 3-node cluster, 5 shards, 1 replica, so I'm sure there's
> code in there that is built around those assumptions, but should be easily
> modified to suit your purpose perhaps.
>
> cheers,
>
> Paul
>
>
> On 31 January 2014 11:59, Paul Smith  wrote:
>
>> the flappy detection tool I have connects to the cluster using the
>> standard java autodiscovery mechanism, and, works out which shards are
>> involved, and then creates explicit TransportClient connection to each
>> host, so would need access to 9300 (the SMILE based protocol port).  Would
>> that help? (is 9300 accessible from a host that can run java ?
>>
>>
>> On 31 January 2014 11:45,  wrote:
>>
>>> We have 4 query heads total (esq1.r6, esq2.r6, esq3.r7, esq4.r7).
>>> Interestingly query heads in the same rack give the same results. We don't
>>> do deletes at all on these indices so that shouldn't be an issue.
>>> Unfortunately at the moment I can't do preference=_local while getting the
>>> _id(s) directly because we don't allow access on 9200 on our worker nodes.
>>> I might be able to right some code to figure this out though. Either way
>>> here's my id results from the different heads.
>>>
>>> esq2.r6 gets 28 total results
>>> esq3.r7 gets 9 total results
>>>
>>> $curl -XGET "
>>> http://esq2.r6:9200/events/_search?q=sessionId:1390953880&size=100"; |
>>> jq '.hits.hits[]._id' | sort
>>>   % Total% Received % Xferd  Average Speed   TimeTime Time
>>>  Current
>>>  Dload  Upload   Total   SpentLeft
>>>  Speed
>>> 100 19337  100 193370 0  1039k  0 --:--:-- --:--:-- --:--:--
>>> 1049k
>>> "0LcI_px4SZy5ZQkI_V7Qyw"
>>> "1sAGREtMSfK8OIxZErm8RQ"
>>> "6IV2v4TFTr-Gl1eC6hrj0Q"
>>> "6nwMexTHQBmFxfykOgKqWA"
>>> "7hFYs6y-QG6wGYEkoBKmdg"
>>> "9MTM10SeQ2yqWIb08oPnFA"
>>> "aELtGN6DQpmdRlQbr8i0uA"
>>> "AUHUg6k0QZOf_oGjsjSsGA"
>>> "Bo_u1eYGSF2LeU78kbcFZg"
>>> "EWs1K8YsR9-IBSAWK6ld7A"
>>> "Fx4l6_axSGCxpyFm7C7BSQ"
>>> "gpCrAZrNTNezWPfensER3g"
>>> "HAFmGcWuQAylxGjmnZZkSQ"
>>> "HB4Kwz3RSWWH5NHvyH4JMg"
>>> "H-eP-33FREOtq7v0uBPWbQ"
>>> "_IH6W4DoTRmdms0FJNlg4g"
>>> "iK_3TbzcSj2-MbMXip_XFg"
>>> "J4bjPFIcQ1ewrQqjN2qz6Q"
>>> "kfonMDBuR--UIhkyM2cWrg"
>>> "Kr6-9-3uR9Wp2923n-O2NA"
>>> "Nw_9rjwvQ62u-HsuWIm53A"
>>> "QRmY8R2MQemuePb0EkYxWA"
>>> "usloSJzQRzCpOQ8bxKi2vA"
>>> "w9NGEWg-QiivMpjyurYKrA"
>>> "wKy-YzB-TK2lnK86Sx2RBA"
>>> "y2ZmJ-_GRAmi3eHy1y8jzw"
>>> "ZmFj7w4hR5Cvy-owCLmZ1Q"
>>> "ZmlndPBLT-ivuOxm_A7yDA"
>>>
>>> $curl -XGET "
>>> http://esq3.r7:9200/events/_search?q=sessionId:1390953880&size=100"; |
>>> jq '.hits.hits[]._id' | sort
>>>   % Total% Received % Xferd  Average Speed   TimeTime Time
>>>  Current
>>>  Dload  Upload   Total   SpentLeft
>>>  Speed
>>> 100  6808  100  68080 0  70082  0 --:--:-- --:--:-- --:--:--
>>> 70185
>>> "1sAGREtMSfK8OIxZErm8RQ"
>>> "7hFYs6y-QG6wGYEkoBKmdg"
>>> "aELtGN6DQpmdRlQbr8i0uA"
>>> "Fx4l6_axSGCxpyFm7C7BSQ"
>>> "HAFmGcWuQAylxGjmnZZkSQ"
>>> "H-eP-33FREOtq7v0uBPWbQ"
>>> "QRmY8R2MQemuePb0EkYxWA"
>>> "wKy-YzB-TK2lnK86Sx2RBA"
>>> "y2ZmJ-_GRAmi3eHy1y8jzw"
>>>
>>> And here is es3.r7 with preference=_primary_first:
>>>
>>> $curl -XGET "
>>> http://esq3.r7/events/_search?q=sessionId:1390953880&size=100&preference=_primary_first";
>>> | jq '.hits.hits[]._id' | sort
>>>   % Total% Received % Xferd  Average Speed   TimeTime Time
>>>  Current
>>

Re: Primary vs. replica shard inconsistencies?

2014-01-30 Thread Paul Smith

if it helps at all, i've pushed the flappy item detector tool (*cough*)
here:

https://github.com/Aconex/es-flappyitem-detector

We have a simple 3-node cluster, 5 shards, 1 replica, so I'm sure there's
code in there that is built around those assumptions, but should be easily
modified to suit your purpose perhaps.

cheers,

Paul


On 31 January 2014 11:59, Paul Smith  wrote:

> the flappy detection tool I have connects to the cluster using the
> standard java autodiscovery mechanism, and, works out which shards are
> involved, and then creates explicit TransportClient connection to each
> host, so would need access to 9300 (the SMILE based protocol port).  Would
> that help? (is 9300 accessible from a host that can run java ?
>
>
> On 31 January 2014 11:45,  wrote:
>
>> We have 4 query heads total (esq1.r6, esq2.r6, esq3.r7, esq4.r7).
>> Interestingly query heads in the same rack give the same results. We don't
>> do deletes at all on these indices so that shouldn't be an issue.
>> Unfortunately at the moment I can't do preference=_local while getting the
>> _id(s) directly because we don't allow access on 9200 on our worker nodes.
>> I might be able to right some code to figure this out though. Either way
>> here's my id results from the different heads.
>>
>> esq2.r6 gets 28 total results
>> esq3.r7 gets 9 total results
>>
>> $curl -XGET "
>> http://esq2.r6:9200/events/_search?q=sessionId:1390953880&size=100"; | jq
>> '.hits.hits[]._id' | sort
>>   % Total% Received % Xferd  Average Speed   TimeTime Time
>>  Current
>>  Dload  Upload   Total   SpentLeft
>>  Speed
>> 100 19337  100 193370 0  1039k  0 --:--:-- --:--:-- --:--:--
>> 1049k
>> "0LcI_px4SZy5ZQkI_V7Qyw"
>> "1sAGREtMSfK8OIxZErm8RQ"
>> "6IV2v4TFTr-Gl1eC6hrj0Q"
>> "6nwMexTHQBmFxfykOgKqWA"
>> "7hFYs6y-QG6wGYEkoBKmdg"
>> "9MTM10SeQ2yqWIb08oPnFA"
>> "aELtGN6DQpmdRlQbr8i0uA"
>> "AUHUg6k0QZOf_oGjsjSsGA"
>> "Bo_u1eYGSF2LeU78kbcFZg"
>> "EWs1K8YsR9-IBSAWK6ld7A"
>> "Fx4l6_axSGCxpyFm7C7BSQ"
>> "gpCrAZrNTNezWPfensER3g"
>> "HAFmGcWuQAylxGjmnZZkSQ"
>> "HB4Kwz3RSWWH5NHvyH4JMg"
>> "H-eP-33FREOtq7v0uBPWbQ"
>> "_IH6W4DoTRmdms0FJNlg4g"
>> "iK_3TbzcSj2-MbMXip_XFg"
>> "J4bjPFIcQ1ewrQqjN2qz6Q"
>> "kfonMDBuR--UIhkyM2cWrg"
>> "Kr6-9-3uR9Wp2923n-O2NA"
>> "Nw_9rjwvQ62u-HsuWIm53A"
>> "QRmY8R2MQemuePb0EkYxWA"
>> "usloSJzQRzCpOQ8bxKi2vA"
>> "w9NGEWg-QiivMpjyurYKrA"
>> "wKy-YzB-TK2lnK86Sx2RBA"
>> "y2ZmJ-_GRAmi3eHy1y8jzw"
>> "ZmFj7w4hR5Cvy-owCLmZ1Q"
>> "ZmlndPBLT-ivuOxm_A7yDA"
>>
>> $curl -XGET "
>> http://esq3.r7:9200/events/_search?q=sessionId:1390953880&size=100"; | jq
>> '.hits.hits[]._id' | sort
>>   % Total% Received % Xferd  Average Speed   TimeTime Time
>>  Current
>>  Dload  Upload   Total   SpentLeft
>>  Speed
>> 100  6808  100  68080 0  70082  0 --:--:-- --:--:-- --:--:--
>> 70185
>> "1sAGREtMSfK8OIxZErm8RQ"
>> "7hFYs6y-QG6wGYEkoBKmdg"
>> "aELtGN6DQpmdRlQbr8i0uA"
>> "Fx4l6_axSGCxpyFm7C7BSQ"
>> "HAFmGcWuQAylxGjmnZZkSQ"
>> "H-eP-33FREOtq7v0uBPWbQ"
>> "QRmY8R2MQemuePb0EkYxWA"
>> "wKy-YzB-TK2lnK86Sx2RBA"
>> "y2ZmJ-_GRAmi3eHy1y8jzw"
>>
>> And here is es3.r7 with preference=_primary_first:
>>
>> $curl -XGET "
>> http://esq3.r7/events/_search?q=sessionId:1390953880&size=100&preference=_primary_first";
>> | jq '.hits.hits[]._id' | sort
>>   % Total% Received % Xferd  Average Speed   TimeTime Time
>>  Current
>>  Dload  Upload   Total   SpentLeft
>>  Speed
>> 100 19335  100 193350 0   871k  0 --:--:-- --:--:-- --:--:--
>>  899k
>> "0LcI_px4SZy5ZQkI_V7Qyw"
>> "1sAGREtMSfK8OIxZErm8RQ"
>> "6IV2v4TFTr-Gl1eC6hrj0Q"
>> "6nwMexTHQBmFxfykOgKqWA"
>> "7hFYs6y-QG6wGYEkoBKmdg"
>> "9MTM10SeQ2yqWIb08oPnFA"
>> "aELtGN6DQpmdRlQbr8i0uA"
>> "AUHUg6k0QZOf_oGjsjSsGA"
>> "Bo_u1eYGSF2LeU78kbcFZg&quo

Re: Primary vs. replica shard inconsistencies?

2014-01-30 Thread Paul Smith

014 4:00:49 PM UTC-8, tallpsmith wrote:
>
>> If you can narrow down a specific few IDs of results that
>> appear/disappear based on the primary/replica shard, and confirm through an
>> explicit GET of that ID with the preference=_local on the primary shard &
>> replica for that result.  To work out which shard # a specific ID belongs
>> to, you can run this query:
>>
>> curl -XGET '*http://127.0.0.1:9200/_all/_search?pretty=1
>> <http://127.0.0.1:9200/_all/_search?pretty=1>*' -d '
>> {
>> "fields" : [],
>> "query" : {
>> "ids" : {
>> "values" : [
>> "123456789"
>> ]
>> }
>> },
>> "explain" : 1
>> }
>> '
>>
>> where the "values" attribute you place the ID of the item you're after.
>>  Within the result response you'l see the shard Id, use that to identify
>> which host is the primary and which is the replica.  You can then run the
>> GET query with the preference=_local on each of those hosts and see if the
>> primary or replica shows the result.  You will need to understand whether
>> the item that is 'flappy' (appearing/disappearing depending on the shard
>> being searched) is supposed to be in there or not, perhaps checking the
>> data store that is the source of the index (is it a dB?).
>>
>> We have very infrequent case where the replica shard is not properly
>> receiving a delete at least with 0.19.10.  The delete successfully applies
>> to the Primary, but the Replica still holds the value and returns it within
>> search results.  We have loads of insert/update/delete activity and the
>> number of flappy items is very small, but it is definitely a thing.
>>
>> see this previous thread:  http://elasticsearch-users.
>> 115913.n3.nabble.com/Deleted-items-appears-when-searching-
>> a-replica-shard-td4029075.html
>>
>> If it is the replica shard that's incorrect (my bet), the way to fix it
>> is to relocate the replica shard to another host.  The relocation will take
>> the copy of the primary (correct copy) and recreate a new replica shard,
>> effectively neutralizing the inconsistency.
>>
>> We have written a tool, Scrutineer (https://github.com/aconex/scrutineer)
>> which can help detect inconsistencies in your cluster.  I also have a tool
>> not yet published to github that can help check these Primary/Replica
>> inconsistencies if that would help (you pass a list of IDs to it and it'll
>> check whether they're flappy between the primary & replica or not).  It can
>> also help automate the rebuilding of just the replica shards by shunting
>> them around (rather than a full rolling restart of ALL the shards, just the
>> shard replicas you want)
>>
>> cheers,
>>
>> Paul Smith
>>
>>
>>
>>
>>
>> On 31 January 2014 09:44, Binh Ly  wrote:
>>
>>> Xavier, can you post an example of 1 full query and then also show how
>>> the results of this one query is inconsistent? Just trying to understand
>>> what is inconsistent. Thanks.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/038735ba-cef6-4634-9d46-7ff39dffc4d2%
>>> 40googlegroups.com.
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a99ab249-ddf4-4c38-97d7-3bfe8ec41b5f%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB50LGse5BghNJjo8a-RVkYtRiyVpS-p5d%3DH%3DZGWxk7PAg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Primary vs. replica shard inconsistencies?

2014-01-30 Thread Paul Smith

If you can narrow down a specific few IDs of results that appear/disappear
based on the primary/replica shard, and confirm through an explicit GET of
that ID with the preference=_local on the primary shard & replica for that
result.  To work out which shard # a specific ID belongs to, you can run
this query:

curl -XGET '*http://127.0.0.1:9200/_all/_search?pretty=1
<http://127.0.0.1:9200/_all/_search?pretty=1>*' -d '
{
"fields" : [],
"query" : {
"ids" : {
"values" : [
"123456789"
]
}
},
"explain" : 1
}
'

where the "values" attribute you place the ID of the item you're after.
 Within the result response you'l see the shard Id, use that to identify
which host is the primary and which is the replica.  You can then run the
GET query with the preference=_local on each of those hosts and see if the
primary or replica shows the result.  You will need to understand whether
the item that is 'flappy' (appearing/disappearing depending on the shard
being searched) is supposed to be in there or not, perhaps checking the
data store that is the source of the index (is it a dB?).

We have very infrequent case where the replica shard is not properly
receiving a delete at least with 0.19.10.  The delete successfully applies
to the Primary, but the Replica still holds the value and returns it within
search results.  We have loads of insert/update/delete activity and the
number of flappy items is very small, but it is definitely a thing.

see this previous thread:
http://elasticsearch-users.115913.n3.nabble.com/Deleted-items-appears-when-searching-a-replica-shard-td4029075.html

If it is the replica shard that's incorrect (my bet), the way to fix it is
to relocate the replica shard to another host.  The relocation will take
the copy of the primary (correct copy) and recreate a new replica shard,
effectively neutralizing the inconsistency.

We have written a tool, Scrutineer (https://github.com/aconex/scrutineer)
which can help detect inconsistencies in your cluster.  I also have a tool
not yet published to github that can help check these Primary/Replica
inconsistencies if that would help (you pass a list of IDs to it and it'll
check whether they're flappy between the primary & replica or not).  It can
also help automate the rebuilding of just the replica shards by shunting
them around (rather than a full rolling restart of ALL the shards, just the
shard replicas you want)

cheers,

Paul Smith

On 31 January 2014 09:44, Binh Ly  wrote:

> Xavier, can you post an example of 1 full query and then also show how the
> results of this one query is inconsistent? Just trying to understand what
> is inconsistent. Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/038735ba-cef6-4634-9d46-7ff39dffc4d2%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB51LsR5XH6G5VX3KcGdPU8mVUc-eEiROPS1wjwQGkaobg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Trouble with has_parent query containing scripted function_score

2014-01-27 Thread Paul Bellora

If only I'd waited a few minutes. Clinton explains the difference between 
my issue and Martin's gist:

The difference is that he didn't specify the type in the URL, so it 
> searched the mappings for all types for a field called weight. Because 
> we've specified the type in the URL, it only searched myChild, notmyParent


On Monday, January 27, 2014 3:51:23 PM UTC-5, Paul Bellora wrote:
>
> Just to close the loop on this topic, Clinton Gormley posted an answer to 
> the SO question with a workaround: use doc['*myParent.*weight'].value. He 
> also opened an 
> issue<https://github.com/elasticsearch/elasticsearch/issues/4914>for this 
> behavior.
>
> Although as I commented on the SO answer, Martin Groningen's 
> gist<https://gist.github.com/martijnvg/8639841>does work fine and I'm not 
> sure what the difference is between the two 
> cases.
>
> Thank you Martin and Clinton for your help.
>
> On Sunday, January 26, 2014 4:47:48 PM UTC-5, Martijn v Groningen wrote:
>>
>> Hi Paul,
>>
>> The mapping and query that you're sharing make sense and should work. I 
>> verified that with a small recreation:
>> https://gist.github.com/martijnvg/8639841
>>
>> What I think you're running into is that a document of type 'myParent' 
>> doesn't have the field `weight`. Can you check if that is the case?
>> If so you might want to add a `null_value` for the weight field, so that 
>> all `myParent` docs have a value for the weight field.
>>
>> Martijn
>>
>>
>> On 24 January 2014 21:38, Paul Bellora  wrote:
>>
>>> Update: posted to Stack Overflow with bounty: 
>>> http://stackoverflow.com/questions/21289149/trouble-with-has-parent-query-containing-scripted-function-score
>>>
>>>
>>> On Thursday, January 16, 2014 11:11:27 AM UTC-5, Paul Bellora wrote:
>>>>
>>>> I have two document types, in a parent-child relationship:
>>>>
>>>> "myParent" : {
>>>>   "properties" : {
>>>> "weight" : {
>>>>   "type" : "double"
>>>> }
>>>>   }
>>>> }
>>>>
>>>>
>>>> "myChild" : {
>>>>   "_parent" : {
>>>> "type" : "myParent"
>>>>   },
>>>>   "_routing" : {
>>>> "required" : true
>>>>   }
>>>> }
>>>>
>>>>
>>>> The weight field is to be used for custom scoring/sorting. This query 
>>>> directly against the parent documents works as intended:
>>>>
>>>> {
>>>>   "query" : {
>>>> "function_score" : {
>>>>   "script_score" : {
>>>> "script" : "_score * doc['weight'].value"
>>>>   } 
>>>> } 
>>>>   
>>>>   }
>>>> }
>>>>
>>>>
>>>> However, when trying to do similar scoring for the child documents with 
>>>> a has_parent query, I get an error:
>>>>
>>>> {
>>>>   "query" : {
>>>> "has_parent" : {
>>>>   "query" : {
>>>> "function_score" : {   
>>>>  
>>>>   "script_score" : {
>>>> "script" : "_score * doc['weight'].value"
>>>>   }
>>>> }
>>>>   },
>>>>   "parent_type" : "myParent",
>>>>   "score_type" : "score"
>>>> }
>>>>   }
>>>> }
>>>>
>>>>
>>>> The error is:
>>>>
>>>> QueryPhaseExecutionException[[myIndex][3]: 
>>>> query[filtered(ParentQuery[myParent](filtered(function 
>>>> score (ConstantScore(*:*),function=script[_score * 
>>>> doc['weight'].value], params [null]))->cache(_type:
>>>> myParent)))->cache(_type:myChild)],from[0],size[10]: Query Failed 
>>>> [failed to execute context rewrite]]; nested: 
>>>> ElasticSearchIllegalArgumentException[No 
>>>> field found for [weight] in mapping with types [myChild]];
>>>>
>&

Re: Trouble with has_parent query containing scripted function_score

2014-01-27 Thread Paul Bellora

Just to close the loop on this topic, Clinton Gormley posted an answer to 
the SO question with a workaround: use doc['*myParent.*weight'].value. He 
also opened an 
issue<https://github.com/elasticsearch/elasticsearch/issues/4914>for this 
behavior.

Although as I commented on the SO answer, Martin Groningen's 
gist<https://gist.github.com/martijnvg/8639841>does work fine and I'm not sure 
what the difference is between the two 
cases.

Thank you Martin and Clinton for your help.

On Sunday, January 26, 2014 4:47:48 PM UTC-5, Martijn v Groningen wrote:
>
> Hi Paul,
>
> The mapping and query that you're sharing make sense and should work. I 
> verified that with a small recreation:
> https://gist.github.com/martijnvg/8639841
>
> What I think you're running into is that a document of type 'myParent' 
> doesn't have the field `weight`. Can you check if that is the case?
> If so you might want to add a `null_value` for the weight field, so that 
> all `myParent` docs have a value for the weight field.
>
> Martijn
>
>
> On 24 January 2014 21:38, Paul Bellora >wrote:
>
>> Update: posted to Stack Overflow with bounty: 
>> http://stackoverflow.com/questions/21289149/trouble-with-has-parent-query-containing-scripted-function-score
>>
>>
>> On Thursday, January 16, 2014 11:11:27 AM UTC-5, Paul Bellora wrote:
>>>
>>> I have two document types, in a parent-child relationship:
>>>
>>> "myParent" : {
>>>   "properties" : {
>>> "weight" : {
>>>   "type" : "double"
>>> }
>>>   }
>>> }
>>>
>>>
>>> "myChild" : {
>>>   "_parent" : {
>>> "type" : "myParent"
>>>   },
>>>   "_routing" : {
>>> "required" : true
>>>   }
>>> }
>>>
>>>
>>> The weight field is to be used for custom scoring/sorting. This query 
>>> directly against the parent documents works as intended:
>>>
>>> {
>>>   "query" : {
>>> "function_score" : {
>>>   "script_score" : {
>>> "script" : "_score * doc['weight'].value"
>>>   } 
>>> }   
>>> 
>>>   }
>>> }
>>>
>>>
>>> However, when trying to do similar scoring for the child documents with 
>>> a has_parent query, I get an error:
>>>
>>> {
>>>   "query" : {
>>> "has_parent" : {
>>>   "query" : {
>>> "function_score" : {   
>>>  
>>>   "script_score" : {
>>> "script" : "_score * doc['weight'].value"
>>>   }
>>> }
>>>   },
>>>   "parent_type" : "myParent",
>>>   "score_type" : "score"
>>> }
>>>   }
>>> }
>>>
>>>
>>> The error is:
>>>
>>> QueryPhaseExecutionException[[myIndex][3]: 
>>> query[filtered(ParentQuery[myParent](filtered(function 
>>> score (ConstantScore(*:*),function=script[_score * 
>>> doc['weight'].value], params [null]))->cache(_type:
>>> myParent)))->cache(_type:myChild)],from[0],size[10]: Query Failed 
>>> [failed to execute context rewrite]]; nested: 
>>> ElasticSearchIllegalArgumentException[No 
>>> field found for [weight] in mapping with types [myChild]];
>>>
>>> It seems like instead of taking the result of the scoring function and 
>>> applying it to the child, ES is taking the scoring *function* and 
>>> applying it to the child, hence the error.
>>>
>>> If I don't use score for score_type, the error doesn't occur, although 
>>> the results scores are then all 1.0, as documented.
>>>
>>> What am I missing here? How can I query these child documents with 
>>> custom scoring based on a parent field?
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a3ec1600-157e-45e9-b006-37cddc2b422f%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
> -- 
> Met vriendelijke groet,
>
> Martijn van Groningen 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/662aaafb-79fa-4438-857d-43a46aef7c5a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: match query vs query string

2014-01-26 Thread avinash paul

thank Johan you will try it.

-paul


On Fri, Jan 24, 2014 at 1:49 PM, Johan Rask  wrote:

> Hi,
>
> You have to supply the name of the field for both words.
>
> name:Massachusetts name:Insti , Otherwise it will search for Insti in your
> defaul search field.
>
> ../university/_search?q=name:Massachusetts%20name:Insti&fields=name
>
> The match query does this for you.
>
> Regards /Johan
>
>
> Den fredagen den 24:e januari 2014 kl. 07:02:53 UTC+1 skrev paul:
>
>> Hi ,
>>
>> I am testing the below two queries but it gives different results
>>
>> *Query 1:*
>> .../university/_search?q=name:Massachusetts%20Insti&fields=name
>>
>> *Query 2:*
>> {"size": 10,
>> "fields": [
>>"name"
>> ],
>> "query": {
>>"match": {
>>   "name": {
>>   "query": "Massachusetts Insti"
>> }
>>   }
>> }
>>
>> I believe space between the words is causing problem , how to represent
>> the query string in *Query 2* in* Query 1 *i tries "+: "%20" for
>> representing spaces.
>>
>> Regards
>> Avinash
>>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/Lx083-eHwoM/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1388fcd8-b188-44d4-acd5-925175b46849%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAO066G1jz6UFpZO1OvkGQtMx8tp7ij-FB4F18wkXciCoZ8OG1g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Trouble with has_parent query containing scripted function_score

2014-01-24 Thread Paul Bellora

Update: posted to Stack Overflow with bounty: 
http://stackoverflow.com/questions/21289149/trouble-with-has-parent-query-containing-scripted-function-score

On Thursday, January 16, 2014 11:11:27 AM UTC-5, Paul Bellora wrote:
>
> I have two document types, in a parent-child relationship:
>
> "myParent" : {
>   "properties" : {
> "weight" : {
>   "type" : "double"
> }
>   }
> }
>
>
> "myChild" : {
>   "_parent" : {
> "type" : "myParent"
>   },
>   "_routing" : {
> "required" : true
>   }
> }
>
>
> The weight field is to be used for custom scoring/sorting. This query 
> directly against the parent documents works as intended:
>
> {
>   "query" : {
> "function_score" : {
>   "script_score" : {
> "script" : "_score * doc['weight'].value"
>   } 
> } 
>   
>   }
> }
>
>
> However, when trying to do similar scoring for the child documents with a 
> has_parent query, I get an error:
>
> {
>   "query" : {
> "has_parent" : {
>   "query" : {
> "function_score" : { 
>
>   "script_score" : {
> "script" : "_score * doc['weight'].value"
>   }
> }
>   },
>   "parent_type" : "myParent",
>   "score_type" : "score"
> }
>   }
> }
>
>
> The error is:
>
> QueryPhaseExecutionException[[myIndex][3]: 
> query[filtered(ParentQuery[myParent](filtered(function score 
> (ConstantScore(*:*),function=script[_score * doc['weight'].value], params 
> [null]))->cache(_type:myParent)))->cache(_type:myChild)],from[0],size[10]: 
> Query Failed [failed to execute context rewrite]]; nested: 
> ElasticSearchIllegalArgumentException[No field found for [weight] in 
> mapping with types [myChild]];
>
> It seems like instead of taking the result of the scoring function and 
> applying it to the child, ES is taking the scoring *function* and 
> applying it to the child, hence the error.
>
> If I don't use score for score_type, the error doesn't occur, although 
> the results scores are then all 1.0, as documented.
>
> What am I missing here? How can I query these child documents with custom 
> scoring based on a parent field?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a3ec1600-157e-45e9-b006-37cddc2b422f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

match query vs query string

2014-01-23 Thread paul

Hi ,

I am testing the below two queries but it gives different results

*Query 1:*
.../university/_search?q=name:Massachusetts%20Insti&fields=name

*Query 2:*
{"size": 10, 
"fields": [
   "name"
],
"query": {
   "match": {
  "name": {
  "query": "Massachusetts Insti"
}
  }
}

I believe space between the words is causing problem , how to represent the 
query string in *Query 2* in* Query 1 *i tries "+: "%20" for representing 
spaces.

Regards
Avinash

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/df824353-2a8a-4bb9-82b8-ac377f78fdc8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

EdgeNgram Boosting more than Synonum

2014-01-23 Thread paul

Hi ,

I have the following entry in my synonym properties

YU,Yeshiva University,Yale University

my college name field has the following mapping

"autocomplete_search":{
   "type":"custom",
   "tokenizer":"whitespace",
   "filter":[
  "lowercase",
  "syns_filter"
   ]
},
"autocomplete_index":{
   "type":"custom",
   "tokenizer":"whitespace",
   "filter":[
  "lowercase",
  "syns_filter",
  "my_edgeNgram"
   ]
}

When i query

{
"fields": [
   "name"
],
"query": {
   "match": {
  "name": {
  "query": "YU"
}
  }
}
}

my results are as below:

"name": "Yuba College"
"name": "Yukon Beauty College"
 "name": "Yale University"
"name": "Cambridge Junior College - Yuba City"

Tough "Yale University" matched completely from synonums still the other 
two results are higher up which matched "YU" why?

-paul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cf3530d4-1ac8-4ba7-9ac7-e93d4ab2db0e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: keyword tokenizer

2014-01-23 Thread avinash paul

Binh , When i removed the syns_filter its still the same but when i changed
the   "tokenizer":"keyword", to "whitespcae" it taking "university"
into account. May be its a tokenizer problem , when there is a space the
keyword tokenizer is omitting the word after space.

-paul


On Wed, Jan 22, 2014 at 11:00 PM, Binh Ly  wrote:

> Paul, Is it possible that your "syns_filter" is affecting your ngram
> filter? What happens when you remove the syns_filter?
>
>
> On Wednesday, January 22, 2014 6:17:12 AM UTC-5, paul wrote:
>>
>> My mapping looks as below
>>
>>  "autocomplete_index":{
>>"type":"custom",
>>"tokenizer":"keyword",
>>"filter":[
>>   "lowercase",
>>   "syns_filter",
>>   "my_edgeNgram"
>>]
>> }
>>
>> Now when i analyze the configuration using analyze api the word after
>> space gets omitted . ie "university" is omitted
>>
>> ../universityindextest2/_analyze?
>> analyzer=autocomplete_index&text=yale%20university&pretty
>>
>> output
>> --
>>
>> { "tokens" : [ { "token" : "ya", "start_offset" : 0, "end_offset" : 
>> 15,"type" : "word","position" : 1}, {"token" : "yal","start_offset" : 
>> 0,"end_offset" : 15,"type" : "word","position" : 2}, {"token" : 
>> "yale","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 3}, 
>> {"token" : "yu","start_offset" : 0,"end_offset" : 15,"type" : 
>> "word","position" : 4} ]
>> }
>>
>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/inRyvJJDPpo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/423e6c0f-0aa2-4f48-a357-a313905fb8c0%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAO066G0Y%2BAoVt%2BN6q1bxr8KFN2A686U2Cp%3DyyEoHT_s41_vbzg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Query filtered by _version range?

2014-01-22 Thread Paul Smith

This has been asked before (I googled!), but I thought I'd seen some
recent-or-maybe-not-that recent changes in ES about the special _version
field and being able to do 'something' extra with it other than it's use as
the Optimistic Locking pattern.

Certainly in ES 0.19 one can't query it, but is that still the case in more
recent ES versions at all?  Just wanted to confirm if my memory of seeing
something in this area is incorrect.

My use case is External Versioning using timestamps as the version, so
being able to query items changed since a certain time would be useful.  I
could index a separate "queryableVersion" field with the same value, but
maybe there's a better way.

cheers,

Paul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB5iagSPcp4xem-zZo4DJkHG3V07vkS%2BwsxBTPq2kfTCNQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

keyword tokenizer

2014-01-22 Thread paul

My mapping looks as below

 "autocomplete_index":{
   "type":"custom",
   "tokenizer":"keyword",
   "filter":[
  "lowercase",
  "syns_filter",
  "my_edgeNgram"
   ]
}

Now when i analyze the configuration using analyze api the word after space 
gets omitted . ie "university" is omitted

../universityindextest2/_analyze?analyzer=autocomplete_index&text=yale%20university&pretty

output
--

{ "tokens" : [ { "token" : "ya", "start_offset" : 0, "end_offset" : 15,"type" : 
"word","position" : 1}, {"token" : "yal","start_offset" : 0,"end_offset" : 
15,"type" : "word","position" : 2}, {"token" : "yale","start_offset" : 
0,"end_offset" : 15,"type" : "word","position" : 3}, {"token" : 
"yu","start_offset" : 0,"end_offset" : 15,"type" : "word","position" : 4} ]
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d6bd7caa-b160-42ac-948c-6aab6884a51d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Synonym configuration

2014-01-21 Thread avinash paul

I got the answer by searching Google groups.


On Tue, Jan 21, 2014 at 10:55 AM, paul  wrote:

> Hi ,
>
> I read about in lot of places That There are two approaches when working
> with synonyms :
>
>- expanding them at indexing time,
>- expanding them at query time.
>
> Expanding synonyms at query time is not recommended since it raises issues
> with :
>
>- scoring, since synonyms have different document frequencies,
>- multi-token synonyms, since the query parser splits on white spaces.
>
> so to configure expanding synonym ant index time in elastic search what is
> the configuration.
> right now my configuration is as below , i am using synonym filter both in
> index analyzer and query analyzer so that means i am expanding index
> time and query time.
>
> "name":{
>"type":"string",
>"index_analyzer" : "autocomplete_index",
> "search_analyzer" : "autocomplete_search"
> },
>
> {
>"settings":{
>   "analysis":{
>  "analyzer":{
> "synonym":{
>"tokenizer":"whitespace",
>"filter":[
>   "lowercase",
>   "syns_filter"
>]
> },
>   "autocomplete_search":{
>"type":"custom",
>"tokenizer":"whitespace",
>"filter":[
>   "lowercase",
>   "syns_filter",
>   "stop"
>]
> },
> "autocomplete_index":{
>"type":"custom",
>"tokenizer":"whitespace",
>"filter":[
>   "lowercase",
>   "syns_filter",
>   "stop",
>   "my_edgeNgram"
>]
> }
>  "filter":{
> "syns_filter":{
>"synonyms_path":"synonyms/synonym_collegename.txt",
>"type":"synonym",
>"ignore_case":true,
>"expand":false
> }
> }
> 
>
> -paul
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/DTf-1Q0oaSw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1c21cd11-eb92-47b5-b695-61b33bd256fa%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAO066G3%2BM1%3DZxijZ%3DtKDZp-QOJCpZnXz8VHhuDJaOcPP65NPKA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Synonym configuration

2014-01-20 Thread paul

Hi ,

I read about in lot of places That There are two approaches when working 
with synonyms :

   - expanding them at indexing time,
   - expanding them at query time.

Expanding synonyms at query time is not recommended since it raises issues 
with :

   - scoring, since synonyms have different document frequencies,
   - multi-token synonyms, since the query parser splits on white spaces.

so to configure expanding synonym ant index time in elastic search what is 
the configuration.
right now my configuration is as below , i am using synonym filter both in 
index analyzer and query analyzer so that means i am expanding index 
time and query time.

"name":{
   "type":"string",
   "index_analyzer" : "autocomplete_index", 
"search_analyzer" : "autocomplete_search"
},

{
   "settings":{
  "analysis":{
 "analyzer":{
"synonym":{
   "tokenizer":"whitespace",
   "filter":[
  "lowercase",
  "syns_filter"
   ]
},
  "autocomplete_search":{
   "type":"custom",
   "tokenizer":"whitespace",
   "filter":[
  "lowercase",
  "syns_filter",
  "stop"
   ]
},
"autocomplete_index":{
   "type":"custom",
   "tokenizer":"whitespace",
   "filter":[
  "lowercase",
  "syns_filter",
  "stop",
  "my_edgeNgram"
   ]
}
 "filter":{
"syns_filter":{
   "synonyms_path":"synonyms/synonym_collegename.txt",
   "type":"synonym",
   "ignore_case":true,
   "expand":false
}
}


-paul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1c21cd11-eb92-47b5-b695-61b33bd256fa%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Synonym Filter

2014-01-20 Thread paul

Hi,

My Synonym file contains the entry as below

MIT,Massachusetts Institute of Technology

My setting is as below:

   "settings":{
  "analysis":{
 "analyzer":{
"synonym":{
   "tokenizer":"my_pipe_analyzer",
   "filter":[
  "lowercase",
  "syns_filter"
   ]
},
"my_pipe_analyzer":{
   "tokenizer":"my_pipe_analyzer"
},
"autocomplete_search":{
   "type":"custom",
   "tokenizer":"my_pipe_analyzer",
   "filter":[
  "lowercase",
  "syns_filter",
  "stop"
   ]
}
 },
 "tokenizer":{
"my_pipe_analyzer":{
   "type":"pattern",
   "pattern":"\\|"
}
 },
 "filter":{
"syns_filter":{
   "synonyms_path":"synonyms/synonym_collegename.txt",
   "type":"synonym",
   "ignore_case":true
}
 }
  }
   }

I have created a pipe separated tokanizer so that the synonyms are not 
split on spaces still it is getting split on spaces when i verify it with 
the analyze API , below is my output from 
analyzer api.

{
   "tokens":[
  {
 "token":"mit",
 "start_offset":0,
 "end_offset":3,
 "type":"SYNONYM",
 "position":1
  },
  {
 "token":"massachusetts",
 "start_offset":0,
 "end_offset":3,
 "type":"SYNONYM",
 "position":1
  },
  {
 "token":"institute",
 "start_offset":0,
 "end_offset":3,
 "type":"SYNONYM",
 "position":2
  },
  {
 "token":"technology",
 "start_offset":0,
 "end_offset":3,
 "type":"SYNONYM",
 "position":4
  }
   ]
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7516d1a7-72d0-4b3f-b426-deb80b8d6450%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Change "search_analyzer" settings without reindexing

2014-01-19 Thread paul

hi ,

Is there a way to change search analyzer settings without re-indexing the 
entire data. I am experimenting with various configuration and every time i 
change the setting i am re-indexing.

setting:
--
"autocomplete_search":{
   "type":"custom",
   "tokenizer":"my_pipe_analyzer",
   "filter":[
  "lowercase",
  "syns_filter",
  "stop"
   ]
},
"autocomplete_index":{
   "type":"custom",
   "tokenizer":"standard",
   "filter":[
  "standard",
  "lowercase",
  "syns_filter",
  "stop",
  "my_edgeNgram"
   ]
}
mapping:

"name":{
   "type":"string",
   "index_analyzer" : "autocomplete_index", 
   "search_analyzer" : "autocomplete_search"
},

-paul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/97baca09-4d0f-4f70-b654-68510885e9e4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Trouble with has_parent query containing scripted function_score

2014-01-16 Thread Paul Bellora

I have two document types, in a parent-child relationship:

"myParent" : {
  "properties" : {
"weight" : {
  "type" : "double"
}
  }
}


"myChild" : {
  "_parent" : {
"type" : "myParent"
  },
  "_routing" : {
"required" : true
  }
}


The weight field is to be used for custom scoring/sorting. This query 
directly against the parent documents works as intended:

{
  "query" : {
"function_score" : {
  "script_score" : {
"script" : "_score * doc['weight'].value"
  } 
}   
  }
}


However, when trying to do similar scoring for the child documents with a 
has_parent query, I get an error:

{
  "query" : {
"has_parent" : {
  "query" : {
"function_score" : {   
 
  "script_score" : {
"script" : "_score * doc['weight'].value"
  }
}
  },
  "parent_type" : "myParent",
  "score_type" : "score"
}
  }
}


The error is:

QueryPhaseExecutionException[[myIndex][3]: 
query[filtered(ParentQuery[myParent](filtered(function score 
(ConstantScore(*:*),function=script[_score * doc['weight'].value], params 
[null]))->cache(_type:myParent)))->cache(_type:myChild)],from[0],size[10]: 
Query Failed [failed to execute context rewrite]]; nested: 
ElasticSearchIllegalArgumentException[No field found for [weight] in 
mapping with types [myChild]];

It seems like instead of taking the result of the scoring function and 
applying it to the child, ES is taking the scoring *function* and applying 
it to the child, hence the error.

If I don't use score for score_type, the error doesn't occur, although the 
results scores are then all 1.0, as documented.

What am I missing here? How can I query these child documents with custom 
scoring based on a parent field?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/714f8a04-a88a-4a47-8baf-998992353f1f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Function_score

2014-01-13 Thread avinash paul

thank you william will try upgrading

-paul


On Tue, Jan 14, 2014 at 4:23 AM, William Goldstein <
williamrgoldst...@gmail.com> wrote:

> Hey I realize this is an old thread, but just in case anyone else has this
> problem, it's almost definitely a versioning issue... updating to 0.90.10
> solved it for me.
>
> I think I got an older version by using `brew install`, but
> download/installation directly from the website is super easy:
> http://www.elasticsearch.org/download/
>
>
>
> On Friday, November 22, 2013 4:37:40 AM UTC-5, paul wrote:
>>
>> hi Alex i tried as suggested but it seams i am still missing something ,
>> when i tried the query below i still get No query registered for
>> [function_score]] exception
>>
>>
>> {
>>   "query": {
>> "function_score": {
>>   "functions": [
>> {
>>   "DECAY_FUNCTION": {
>> "Tution": {
>>   "reference": "1",
>>   "scale": "5000"
>> }
>>   }
>> }
>>   ],
>>   "query": {
>> "match": {
>>   "programs": "computer science"
>> }
>>   },
>>   "score_mode": "multiply"
>> }
>>   }
>> }
>>
>> On Friday, 22 November 2013 14:04:49 UTC+5:30, Alexander Reelsen wrote:
>>>
>>> Hey,
>>>
>>> you need to use one of the score functions mentioned in
>>> http://www.elasticsearch.org/guide/en/elasticsearch/
>>> reference/current/query-dsl-function-score-query.html#_score_functions<http://www.google.com/url?q=http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Fquery-dsl-function-score-query.html%23_score_functions&sa=D&sntz=1&usg=AFQjCNFBAPkmuOAra99m7KO_A6OltCBZoQ>
>>>
>>> So either script_score, boost factor, random or one of the decay
>>> functions.. the error message is also telling you not to forget the query
>>> field inside of the function score.
>>>
>>>
>>> --Alex
>>>
>>>
>>> On Fri, Nov 22, 2013 at 9:31 AM, paul  wrote:
>>>
>>>> What should be defined inside the function , any example queries. could
>>>> not find one on elastic search page.this throws a exception  No query
>>>> registered for [function_score]]
>>>> {
>>>>"query":{
>>>>   "function_score":{
>>>>  "query":{
>>>> "match":{
>>>>"programs":"computer science"
>>>> }
>>>>  },
>>>>  "functions":[
>>>> {
>>>>"filter":{
>>>>   "range":{
>>>>  "tution":{
>>>> "from":1,
>>>> "to":2
>>>>  }
>>>>   }
>>>>},
>>>>"FUNCTION":{
>>>>  "boost": "3"
>>>>}
>>>> }
>>>>  ],
>>>>  "score_mode":"avg",
>>>>  "boost_mode":"avg"
>>>>   }
>>>>}
>>>> }
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to elasticsearc...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/ZGjv6eCBKXs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/69eb1edd-ab2a-47d6-92dd-cb20e6b70116%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAO066G0iRsSxqQPtfJ2j4sJHM7uVS2yb9bK-dHgv5dQDWNLxig%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Getting specific Fields

2014-01-03 Thread avinash paul

Thank you for the quick response.


On Fri, Jan 3, 2014 at 1:00 AM, Ivan Brusic  wrote:

> Not yet supported:
> https://github.com/elasticsearch/elasticsearch/issues/3022
>
> Cheers,
>
> Ivan
>
>
> On Thu, Jan 2, 2014 at 4:27 AM, paul  wrote:
>
>> My DATA
>> ---
>> {
>>"rankingList":[
>>   {
>>  "value":9,
>>  "key":"Academic"
>>   },
>>   {
>>  "value":6,
>>  "key":"Flexibility"
>>   }
>>]
>> }
>>
>> {
>>"rankingList":[
>>   {
>>  "value":12,
>>  "key":"Academic"
>>   },
>>   {
>>  "value":6,
>>  "key":"Flexibility"
>>   }
>>]
>> }
>>
>> My Mapping
>> ---
>> {
>>"mappings":{
>>   "TestNested":{
>>  "properties":{
>> "rankingList":{
>>"type":"nested"
>> }
>>  }
>>   }
>>}
>> }
>>
>> My QUERY
>> -
>> {
>>   "query": {
>> "nested": {
>>   "path": "rankingList",
>>   "query": {
>> "bool": {
>>   "must": [
>> {
>>   "match": {
>> "rankingList.key": {
>>   "query": "Academic"
>> }
>>   }
>> },
>> {
>>   "range": {
>> "rankingList.value": {
>>   "gt": 5
>> }
>>   }
>> }
>>   ]
>> }
>>   }
>> }
>>   }
>> }
>>
>> I want to get only the key value that is related to "Academic" within the
>> array is it possible. right now query works fine but returns all the array
>> elements.
>>
>> - Paul
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/6f5c2cd1-a92e-4c8c-8bd3-ca8193033080%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/t6ebGDRVR3g/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDqTQy6iVe_%3DgSQowUE-Gh5Ug%2Bn2b_Jn2CsDeRN3GwGKA%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAO066G38R6iHcp7VHO-9T%3DtV0QtZoCxjPmYeP1-Ps2PLUsaojQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Getting specific Fields

2014-01-02 Thread paul

My DATA
---
{
   "rankingList":[
  {
 "value":9,
 "key":"Academic"
  },
  {
 "value":6,
 "key":"Flexibility"
  }
   ]
}

{
   "rankingList":[
  {
 "value":12,
 "key":"Academic"
  },
  {
 "value":6,
 "key":"Flexibility"
  }
   ]
}

My Mapping
---
{
   "mappings":{
  "TestNested":{
 "properties":{
"rankingList":{
   "type":"nested"
}
 }
  }
   }
}

My QUERY
-
{
  "query": {
"nested": {
  "path": "rankingList",
  "query": {
"bool": {
  "must": [
{
  "match": {
"rankingList.key": {
  "query": "Academic"
}
  }
},
{
  "range": {
    "rankingList.value": {
  "gt": 5
}
  }
}
  ]
} 
  }
}
  }
}

I want to get only the key value that is related to "Academic" within the 
array is it possible. right now query works fine but returns all the array 
elements.

- Paul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6f5c2cd1-a92e-4c8c-8bd3-ca8193033080%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: OR query

2014-01-01 Thread avinash paul

Thank you Ivan will definitely try it out.

-paul


On Tue, Dec 31, 2013 at 10:51 PM, Ivan Brusic  wrote:

> You are better of using a proper boolean filter for better performance.
> Queries cannot be cached and query string query analyzes the terms. Here is
> an example of your filter with a nested bool (should) filter:
>
> "filter": {
>   "and": {
> "filters": [
>   {
> "bool": {
>   "must": [
> {
>   "bool": {
> "should": [
>   {
> "term": {
>   "state": "MA"
> }
>   },
>   {
> "term": {
>   "state": "NY"
> }
>   }
> ]
>   }
> },
> {
>   "range": {
> "costOutofstateTution": {
>   "gte": 0,
>   "lte": 3
> }
>   }
> }
>   ]
> }
>   }
> ]
>   }
> }
>
> Cheers,
>
> Ivan
>
>
> On Mon, Dec 30, 2013 at 10:03 PM, paul  wrote:
>
>> I got the query wotking by using
>>
>> {
>>   "query_string": {
>>  "default_field": "state",
>>  "query": "MA NY"
>>   }
>>   }
>>
>> - Paul
>>
>> On Tuesday, 31 December 2013 11:07:06 UTC+5:30, paul wrote:
>>>
>>> My query is as below ,  which gives me all the colleges with state code
>>> "MA" i want all the colleges that are in "MA" or "NY" how to add OR filter
>>>
>>> {
>>>   "query": {
>>> "filtered": {
>>>   "query": {
>>> "nested": {
>>>   "path": "programs",
>>>   "query": {
>>> "bool": {
>>>   "must": [
>>> {
>>>   "match": {
>>> "programs.progName": "Computer and Information
>>> Sciences"
>>>   }
>>> },
>>> {
>>>   "range": {
>>> "programs.Bachelor": {
>>>   "gt": 0
>>> }
>>>   }
>>> }
>>>   ]
>>> }
>>>   }
>>> }
>>>   },
>>>   "filter": {
>>> "and": {
>>>   "filters": [
>>> {
>>>   "bool": {
>>> "must": [
>>>   {
>>> "term": {
>>>   "state": "MA"
>>> }
>>>   },
>>>   {
>>> "range": {
>>>   "costOutofstateTution": {
>>> "gte": 0,
>>> "lte": 3
>>>   }
>>> }
>>>   }
>>> ]
>>>   }
>>> }
>>>   ]
>>> }
>>>   }
>>> }
>>>   }
>>> }
>>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/d23102f3-3180-4cdc-9d51-8ca960c7bcd0%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/rd6Lh_U0lzI/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC5FF-F%3DLJzpsVUvcq1n%2B%2B_9DFcKgRFJ0r%3Dv3SS7jX_tQ%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAO066G0Ng7dhg3U8L%3Dc49%2BDkM_xWP5feXNYN%3Dfa6Nx55oqSn%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Feedback on docs

2013-12-31 Thread Paul Houle

Hi guys,

I've recently gotten into Elasticsearch because the old search
engine at my site

http://ookaboo.com/

is horribly slow and I need something better. I've done a lot of
work with Lucene and Solr in the past and particular I've been involved
with projects that make very deep changes to those systems because we
wanted to use them to drive a NER system or do statistical IR with advanced
topic modeling. This project is nothing like that, it's just a very
simple search engine that has to be easy to set up, easy to run, and easy
to scale.

Overall the quality of documentation is great and the amount of
attention that is being paid to the "getting started" process is excellent,
particularly when compared with Solr, but I have been looking at the docs
for the java client API and there are some things I could use clarified...

http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/index.html

The big is one is that there are some cross-cutting patterns in the API I
don't totally understand. For instance,

* what is the difference between index() and prepareIndex()?
* what is up with the execute(), actionGet() and get() methods of various
sorts?
* are javadocs available for IndexRequest() and similar objects?

To put this all in context, so far I've had a great experience. Being
able to just unpack elastic search on my Windows laptop or an AWS instance
running Linux and start working is a real breath of fresh air!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a28c9e02-6d1c-472e-83da-16e1cc19b848%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: OR query

2013-12-30 Thread paul

I got the query wotking by using 

{
  "query_string": {
 "default_field": "state",
 "query": "MA NY"
  }
  }

- Paul
On Tuesday, 31 December 2013 11:07:06 UTC+5:30, paul wrote:
>
> My query is as below ,  which gives me all the colleges with state code 
> "MA" i want all the colleges that are in "MA" or "NY" how to add OR filter 
>
> {
>   "query": {
> "filtered": {
>   "query": {
> "nested": {
>   "path": "programs",
>   "query": {
> "bool": {
>   "must": [
> {
>   "match": {
> "programs.progName": "Computer and Information 
> Sciences"
>   }
> },
> {
>   "range": {
> "programs.Bachelor": {
>   "gt": 0
> }
>   }
> }
>   ]
> }
>   }
> }
>   },
>   "filter": {
> "and": {
>   "filters": [
> {
>   "bool": {
> "must": [
>   {
> "term": {
>   "state": "MA"
> }
>   },
>   {
> "range": {
>   "costOutofstateTution": {
> "gte": 0,
> "lte": 3
>   }
> }
>   }
> ]
>   }
> }
>   ]
> }
>   }
> }
>   }
> }
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d23102f3-3180-4cdc-9d51-8ca960c7bcd0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

OR query

2013-12-30 Thread paul

My query is as below ,  which gives me all the colleges with state code 
"MA" i want all the colleges that are in "MA" or "NY" how to add OR filter 

{
  "query": {
"filtered": {
  "query": {
"nested": {
  "path": "programs",
  "query": {
"bool": {
  "must": [
{
  "match": {
"programs.progName": "Computer and Information Sciences"
  }
},
{
  "range": {
"programs.Bachelor": {
  "gt": 0
}
  }
}
  ]
}
  }
}
  },
  "filter": {
"and": {
  "filters": [
{
  "bool": {
"must": [
  {
"term": {
  "state": "MA"
}
  },
  {
"range": {
  "costOutofstateTution": {
"gte": 0,
"lte": 3
  }
}
  }
]
  }
}
  ]
}
  }
}
  }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9614e30f-60e8-44cc-b614-6e5c18f2bc22%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: spring-elasticsearch

2013-12-21 Thread avinash paul

Thank you will try it out
On 20 Dec 2013 17:31, "David Pilato"  wrote:

> I think 0.3.0 will work.
>
> --
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet <https://twitter.com/dadoonet> | 
> @elasticsearchfr<https://twitter.com/elasticsearchfr>
>
>
> Le 20 décembre 2013 at 12:34:21, paul 
> (avinashpau...@gmail.com)
> a écrit:
>
> Hi ,
>
> Right now i am using the below versions of spring elastic-search. If i
> want to upgrade to latest elasticsearch 0.90.8 what version of "spring" and
> "spring-elasticsearch" should i use
>
>  
> fr.pilato.spring
>  spring-elasticsearch
> 0.2.0
> 
> 
> org.elasticsearch
> elasticsearch
> 0.90.0
> 
>  
> org.springframework
> spring-core
> 3.0.7.RELEASE
> 
>
> Regards
> paul
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a3f28e5f-4e8f-4750-9f2c-ae1c52c05707%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/gFlaU221sEU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/etPan.52b431a4.47398c89.111%40MacBook-Air-de-David.local
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAO066G1LUxd2Ede%3DDeBqgzg83dtaryWwA-40n3D6Nsw3ZgYLDA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Analyzer for a field in a nested document?

2013-12-20 Thread paul

got it :)

On Wednesday, 18 December 2013 17:45:45 UTC+5:30, paul wrote:
>
> Hi Andrew,
>
> can you please post your mapping how did u provide analyzer to nested 
> field , i am facing the same problem.
>
> Thanks 
> Paul
>
> On Saturday, 16 June 2012 11:23:01 UTC+5:30, Andrew Cholakian wrote:
>>
>> Ah, figured it out, you can put further properties below the nested type.
>>
>> On Friday, June 15, 2012 3:56:36 PM UTC-7, Andrew Cholakian wrote:
>>>
>>> I have an array of nested documents which I need to specify an analyzer 
>>> for. If anyone knows how to specify an analyzer for a nested document I'd 
>>> greatly appreciate it.
>>>
>>> Our documents look like
>>>
>>> item: {
>>>   labels: {
>>> label: {
>>>   kind: "a str",
>>>   color: "another str"
>>> }
>>>   }
>>> }
>>>
>>> If I wanted to use the keyword analyzer on 'kind' Where would I put that 
>>> in my index mapping?
>>>
>>> Help would be much appreciated, I've tried this many ways but can't seem 
>>> to get it right.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8f2ff7e0-f657-459d-b600-b9fead5b1a4d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

spring-elasticsearch

2013-12-20 Thread paul

Hi ,

Right now i am using the below versions of spring elastic-search. If i want 
to upgrade to latest elasticsearch 0.90.8 what version of "spring" and 
"spring-elasticsearch" should i use


fr.pilato.spring
spring-elasticsearch
0.2.0


org.elasticsearch
elasticsearch
0.90.0


org.springframework
spring-core
3.0.7.RELEASE


Regards
paul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a3f28e5f-4e8f-4750-9f2c-ae1c52c05707%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: "double barreled" surname search fails

2013-12-20 Thread paul

By default Strings are analyzed using "standard analyzer" 
http://localhost:9200//_analyze?analyzer=standard&text=jones-walker&pretty<http://localhost:9200/sports/_analyze?analyzer=standard&text=jones-walker&pretty>

{ "tokens" : [ { "token" : "jones", "start_offset" : 0, "end_offset" : 5, 
"type" 
: "", "position" : 1 }, { "token" : "walker", "start_offset" : 6, 
"end_offset" 
: 12, "type" : "", "position" : 2 } ] }

So the terms stored after analysis are as aboveRead standard analyzer


On Friday, 20 December 2013 16:28:35 UTC+5:30, Mark Perry wrote:
>
> All the data is going through the bulk uploader. In C# I serialize the 
> object that represents the individual and add it to a string that will be 
> passed to a PUT request. I don't think I have a specific mapping - can you 
> recommend something I need to read up on?
>
> sb.Append("{ \"create\" : {\"_index\" : \"knowsleycitizen\", \"_type\" : 
> \"attendance\", \"_id\" : \"" + id.ToString() + "\"} }\n" + 
> Newtonsoft.Json.JsonConvert.SerializeObject(b) + "\n");
>
> On Friday, 20 December 2013 09:36:07 UTC, paul wrote:
>>
>> It depends on your mapping , can you show the mapping for your surname.
>>
>> Reagrds
>> Avinash
>>
>> On Friday, 20 December 2013 14:45:45 UTC+5:30, Mark Perry wrote:
>>>
>>> Can anyone tell me why this search works:
>>>
>>>
>>> {"query":{"bool":{"must":[{"term":{"chosenforename":"brian"}},{"term":{"surname":"jones"}},{"term":{"dob":"2002-02-12T00:00:00"}},{"range":{"eventstart":{"from":"2013-09-01T00:00:00","to":"2013-11-14T00:00:00"}}}]}}}
>>>
>>> but this one doesn't:
>>>
>>>
>>> {"query":{"bool":{"must":[{"term":{"chosenforename":"brian"}},{"term":{"surname":"jones-walker"}},{"term":{"dob":"2002-02-12T00:00:00"}},{"range":{"eventstart":{"from":"2013-09-01T00:00:00","to":"2013-11-14T00:00:00"}}}]}}}
>>>
>>> i.e. second search has a "double barreled" surname in it?
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d972987f-9e6c-46fa-8dc6-301d2e8a583f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: "double barreled" surname search fails

2013-12-20 Thread paul



On Friday, 20 December 2013 14:45:45 UTC+5:30, Mark Perry wrote:
>
> Can anyone tell me why this search works:
>
>
> {"query":{"bool":{"must":[{"term":{"chosenforename":"brian"}},{"term":{"surname":"jones"}},{"term":{"dob":"2002-02-12T00:00:00"}},{"range":{"eventstart":{"from":"2013-09-01T00:00:00","to":"2013-11-14T00:00:00"}}}]}}}
>
> but this one doesn't:
>
>
> {"query":{"bool":{"must":[{"term":{"chosenforename":"brian"}},{"term":{"surname":"jones-walker"}},{"term":{"dob":"2002-02-12T00:00:00"}},{"range":{"eventstart":{"from":"2013-09-01T00:00:00","to":"2013-11-14T00:00:00"}}}]}}}
>
> i.e. second search has a "double barreled" surname in it?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/70d89699-861b-4495-8431-d3dbd4690e2f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Auto complete Group Query

2013-12-20 Thread paul




When I search "elastic" in amazon it shows auto-complete with groups like 
"in books" can this be achieved in elastic search.

I know to do auto complete on one field.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d8fc3373-a34e-4bdc-b399-a47152c984a6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: "double barreled" surname search fails

2013-12-20 Thread paul

It depends on your mapping , can you show the mapping for your surname.

Reagrds
Avinash

On Friday, 20 December 2013 14:45:45 UTC+5:30, Mark Perry wrote:
>
> Can anyone tell me why this search works:
>
>
> {"query":{"bool":{"must":[{"term":{"chosenforename":"brian"}},{"term":{"surname":"jones"}},{"term":{"dob":"2002-02-12T00:00:00"}},{"range":{"eventstart":{"from":"2013-09-01T00:00:00","to":"2013-11-14T00:00:00"}}}]}}}
>
> but this one doesn't:
>
>
> {"query":{"bool":{"must":[{"term":{"chosenforename":"brian"}},{"term":{"surname":"jones-walker"}},{"term":{"dob":"2002-02-12T00:00:00"}},{"range":{"eventstart":{"from":"2013-09-01T00:00:00","to":"2013-11-14T00:00:00"}}}]}}}
>
> i.e. second search has a "double barreled" surname in it?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3f42fff1-faca-4aa1-be51-3704571b3112%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Facte Query

2013-12-20 Thread paul

Ned to club facets , i.e i am faceting for sports on multiple fields 
"menVaritySports" and "womenVaritySports"

result is :

Basketball - 3
Golf - 2
Baseball - 2

It should be Basketball - 2 i.e it should consider one "Basketball" if it 
comes under both men and women. is it possible to do that way.

Data 1

{"sports":{
   "menVaritySports":[
  "Baseball",
  "Basketball"
   ],
   "womenVaritySports":[
  "Basketball",
  "Golf"
   ]
}
}
Data 2

{"sports":{
   "menVaritySports":[
  "Baseball",
  "Basketball"
   ],
   "womenVaritySports":[
  "Golf"
   ]
}
}

QUERY:

{
   "query":{
  "match_all":{

  }
   },
   "facets":{
  "sports":{
 "terms":{
"fields":[
   "sports.menVaritySports.facet",
   "sports.womenVaritySports.facet"
]
 }
  }
   }
}

MAPPING:
---
{
   "mappings":{
  "university":{
 "properties":{
"sports":{
   "properties":{
  "menVaritySports":{
 "type":"multi_field",
 "fields":{
"menVaritySports":{
   "type":"string"
},
"facet":{
   "type":"string",
   "index":"not_analyzed"
}
 }
  },
  "womenVaritySports":{
 "type":"multi_field",
 "fields":{
"womenVaritySports":{
   "type":"string"
},
"facet":{
   "type":"string",
   "index":"not_analyzed"
}
 }
  }
   }
}
 }
  }
   }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/25916485-4576-47e0-aee0-735cd33ddefb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Filter on outer object as well as Nested object

2013-12-19 Thread paul

My data looks like below .
Need to filter on "progName" and "name"

{
   "name":"avinash",
   "assistance":"yes",
   "amount":1,
   "programs":[
  {
 "progName":"Agriculture",
 "BachelorsProgs":53,
 "subs":[
{
   "progName":"Agricultural Business",
   "BachelorsProgs":53
}
 ]
  },
  {
 "progName":"XYZ",
 "BachelorsProgs":12,
 "subs":[
{
   "progName":"Agricultural Business",
   "BachelorsProgs":53
}
 ]
  }
   ]
}

My Mapping 

{
   "mappings":{
  "university":{
 "properties":{
"programs":{
   "type":"nested",
   "properties":{
  "progName":{
 "type":"multi_field",
 "fields":{
"facet":{
   "type":"string",
   "index":"not_analyzed"
},
"progName":{
   "type":"string"
}
 }
  }
   }
}
 }
  }
   }
}

when i query using below query i get exception

{
   "query":{
  "filtered":{
 "query":{
"match_all":{

}
 },
 "filter":{
 "bool": {
 "must": [
{"term": {
   "name": "avinash"
}}
 ]
 },
"nested":{
   "path":"programs",
   "filter":{
  "bool":{
 "must":[
{
   "term":{
  "programs.progName":"bible"
   }
},
{
   "range":{
  "programs.BachelorsProgs":{
 "gt":0
  }
   }
}
 ]
  }
   }
}
 }
  }
   }
}

How do we filter on nested object as well as outer object in a single query

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6229cf27-bc30-4a32-9a87-6988bdd8a5bd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Analyzer for a field in a nested document?

2013-12-18 Thread paul

Hi Andrew,

can you please post your mapping how did u provide analyzer to nested field 
, i am facing the same problem.

Thanks 
Paul

On Saturday, 16 June 2012 11:23:01 UTC+5:30, Andrew Cholakian wrote:
>
> Ah, figured it out, you can put further properties below the nested type.
>
> On Friday, June 15, 2012 3:56:36 PM UTC-7, Andrew Cholakian wrote:
>>
>> I have an array of nested documents which I need to specify an analyzer 
>> for. If anyone knows how to specify an analyzer for a nested document I'd 
>> greatly appreciate it.
>>
>> Our documents look like
>>
>> item: {
>>   labels: {
>> label: {
>>   kind: "a str",
>>   color: "another str"
>> }
>>   }
>> }
>>
>> If I wanted to use the keyword analyzer on 'kind' Where would I put that 
>> in my index mapping?
>>
>> Help would be much appreciated, I've tried this many ways but can't seem 
>> to get it right.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2558de88-5856-4b52-aea3-57a5115a9e1f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Highlighter problem

2013-12-17 Thread avinash paul

Thank you Adrien , will read search-suggesters-completion and see whether
it suits my requirement.

Regards
Paul


On Tue, Dec 17, 2013 at 12:54 PM, Adrien Grand <
adrien.gr...@elasticsearch.com> wrote:

> I think the answer is in the content the content of the synonyms file. For
> example if there is an entry in this file that looks like "Binghamton,
> Binghamton University", in the end the analyzer is going to produce
> something like "b", "bi", ..., "bing", ..., "u", "un", ..., "univ", ... for
> a token whose term is "Binghamton". So if you search for "univ", it is
> actually going to highlight the "bing" of "Binghamton".
>
> I don't think there is a simple solution to your problem. Since you seem
> to be using this index for auto-completion purposes, maybe a better option
> would be to not use synonyms in the analyzer but to add a separate document
> for every synonym.
>
> On a side note, since you are doing auto-completion, maybe you could have
> a look at the completion suggester[1]. Although it doesn't support
> highlighting, I would expect it to be an order of magnitude faster than
> index-based autocompletion so this might be worth checking out.
>
> [1]
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
>
> On Tue, Dec 17, 2013 at 6:03 AM, paul  wrote:
>
>> Sure Adrien  below is my definitions ,
>>
>>  "filter":{
>> "syns_filter":{
>>"synonyms_path":"synonyms/synonym_collegename.txt",
>>    "type":"synonym",
>>"ignore_case":true
>> },
>> "my_edgeNgram":{
>>"type":"edgeNGram",
>>"min_gram":3,
>>"max_gram":10
>> }
>>  }
>>   }
>>
>> Regards
>> Paul
>>
>> On Monday, 16 December 2013 15:30:47 UTC+5:30, Adrien Grand wrote:
>>
>>> Hi Paul,
>>>
>>> Can you also paste your definitions of "syns_filter" and "my_edgeNgram"?
>>>
>>>
>>> On Mon, Dec 16, 2013 at 7:28 AM, paul  wrote:
>>>
>>>> I am trying out Highlighter feature of elastic-search. the text marked
>>>> in yellow is expected but why did it match the text marked in green ?
>>>>
>>>> elastic-search = 0.90.0
>>>> java = 1.7
>>>>
>>>> analyzer on that filed is "autocomplete" below is the configuration
>>>>
>>>> "autocomplete":{
>>>>"type":"custom",
>>>>"tokenizer":"standard",
>>>>"filter":[
>>>>   "standard",
>>>>   "lowercase",
>>>>   "syns_filter",
>>>>   "my_edgeNgram"
>>>>]
>>>>  },
>>>>
>>>> My query:
>>>> {
>>>>   "fields": [
>>>> "name"
>>>>   ],
>>>>   "query": {
>>>> "match": {
>>>>   "name": "univ"
>>>> }
>>>>   },
>>>>   "highlight": {
>>>> "pre_tags": [
>>>>   ""
>>>> ],
>>>> "post_tags": [
>>>>   ""
>>>> ],
>>>> "fields": {
>>>>   "name": {}
>>>> }
>>>>   }
>>>> }
>>>>
>>>>
>>>> Results:
>>>>
>>>> {
>>>>fields:{
>>>>   name:SUNY Binghamton University
>>>>}   highlight:{
>>>>   name:[
>>>>  SUNY Binghamton University
>>>>   ]
>>>>}
>>>> }
>>>>
>>>> {
>>>>fields:{
>>>>   name:Arizona State University
>>>>}   highlight:{
>>>>   name:[
>>>>  Arizona State University
>>>>   ]
>>>>}
>>>> }
>>>>
>>>> {
>>>>fields:{
>>>>   name:Ohio State U

Re: Highlighter problem

2013-12-16 Thread paul

Sure Adrien  below is my definitions , 

 "filter":{
"syns_filter":{
   "synonyms_path":"synonyms/synonym_collegename.txt",
   "type":"synonym",
   "ignore_case":true
},
"my_edgeNgram":{
   "type":"edgeNGram",
   "min_gram":3,
   "max_gram":10
    }
 }
  }

Regards
Paul

On Monday, 16 December 2013 15:30:47 UTC+5:30, Adrien Grand wrote:
>
> Hi Paul,
>
> Can you also paste your definitions of "syns_filter" and "my_edgeNgram"?
>
>
> On Mon, Dec 16, 2013 at 7:28 AM, paul  >wrote:
>
>> I am trying out Highlighter feature of elastic-search. the text marked in 
>> yellow is expected but why did it match the text marked in green ?
>>
>> elastic-search = 0.90.0
>> java = 1.7
>>
>> analyzer on that filed is "autocomplete" below is the configuration
>>
>> "autocomplete":{
>>"type":"custom",
>>"tokenizer":"standard",
>>"filter":[
>>   "standard",
>>   "lowercase",
>>   "syns_filter",
>>   "my_edgeNgram"
>>]
>>  },
>>
>> My query:
>> {
>>   "fields": [
>> "name"
>>   ],
>>   "query": {
>> "match": {
>>   "name": "univ"
>> }
>>   },
>>   "highlight": {
>> "pre_tags": [
>>   ""
>> ],
>> "post_tags": [
>>   ""
>> ],
>> "fields": {
>>   "name": {}
>> }
>>   }
>> }
>>
>>
>> Results:
>>
>> {
>>fields:{
>>   name:SUNY Binghamton University
>>}   highlight:{
>>   name:[
>>  SUNY Binghamton University
>>   ]
>>}
>> }
>>
>> {
>>fields:{
>>   name:Arizona State University
>>}   highlight:{
>>   name:[
>>  Arizona State University
>>   ]
>>}
>> }
>>
>> {
>>fields:{
>>   name:Ohio State University
>>}   highlight:{
>>   name:[
>>  Ohio State University
>>   ]
>>}
>> }
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/5d4a683f-a208-4a17-bce1-0248f43dbcd6%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
> -- 
> Adrien Grand
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7ab57905-1533-4c4a-81e3-b370d0dced7e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

96 matches

Mail list logo