Understanding regexp query better to avoid query failures and OOMs

2014-10-28 Thread Vaidik Kapoor
Hi Guys,

I have been trying to get my head around how Regexp Query works in
Elasticsearch. To my knowledge, it uses Lucene's Regex Engine, which is
limited. A problem with running regexp query on a particular field can be
expensive depending upon the number of unique terms in the index for that
field. So if a field has a value "brown sugar cake", and if the standard
default tokenizer is in-use, then any regex expression provided in the
regexp query for the field holding the mentioned value will run against
brown, sugar and cake and not on the entire string. For this reason, regex
in Elasticsearch (and Lucene) becomes expensive. Am I correct?

Assuming that I am, I have a further question. If the performance of Regex
really depends on the number of unique terms in a field, then reducing the
number of unique tokens should significantly boost up the performance. So
running regexp queries on not_analyzed fields should help. But that's not
the case really and regexp is still extremely slow. In my case, the field
is called URL and it holds URL with the query parameters. The field is
not_analyzed. In most of the cases, a simple regex is fast enough but if
the regex gets slightly complicated, I never get a response from the
server. I also noticed on a local ES server, that the memory starts
increasing and eventually I get an OOM exception.

Another thing that is beyond my understanding is the variables on which
performance of a regexp query works. Just to test that, I created a new
index with just 1 document. The document looks something like this:

{
  "url": "https://abc.com/launchingsoon?product=imgburn&";,
  "ts": 123456679,
  "os": "Linux",
  ...
}

Remember there is just 1 document in the index. I ran the following regex
query:

GET /INDEX/_search
{
   "query": {
  "filtered": {
 "query": {
"bool": {
   "must": [
  {
 "regexp": {
"url":
".*(cacaoweb|youtube-to-mp3-converter|google-chrome|itunes|adwcleaner|msn-messenger-skype|skype|adobe-flash-player-ie|firefox|jpeg-to-pdf|avira-antivir-personal---free-antivirus|irfanview|mp3-converter|realplayer|adobe-reader|youtube-download--convert|internet-explorer-8|windows-live-mail|windows-live-movie-maker-2011|ccleaner|zune-software|vanbascos-karaoke-player|amule|karaoke|imgburn|google-earth|internet-explorer-9|mp3jam|media-downloader|avg-anti-virus-free-edition|k-lite-codec-pack-full|vwo|windows-media-player|opera|kmplayer|sopcast|drweb-cureit|vwo).*"
 }
  }
   ]
}
 }
  }
   }
}

This query ran but it took about 400 ms on my local machine. Then I ran the
following query which has the same regular expression but a very
unoptimized regular expression:

GET /INDEX/_search
{
   "query": {
  "filtered": {
 "query": {
"bool": {
   "must": [
  {
 "regexp": {
"url.not_analyzed":
".*cacaoweb.*|.*youtube-to-mp3-converter.*|.*google-chrome.*|.*itunes.*|.*adwcleaner.*|.*msn-messenger-skype.*|.*skype.*|.*adobe-flash-player-ie.*|.*firefox.*|.*jpeg-to-pdf.*|.*avira-antivir-personal---free-antivirus.*|.*irfanview.*|.*mp3-converter.*|.*realplayer.*|.*adobe-reader.*|.*youtube-download--convert.*|.*internet-explorer-8.*|.*windows-live-mail.*|.*windows-live-movie-maker-2011.*|.*ccleaner.*|.*zune-software.*|.*vanbascos-karaoke-player.*|.*amule.*|.*karaoke.*|.*imgburn.*|.*google-earth.*|.*internet-explorer-9.*|.*mp3jam.*|.*media-downloader.*|.*avg-anti-virus-free-edition.*|.*k-lite-codec-pack-full.*|.*photoscape.*|.*windows-media-player.*|.*opera.*|.*kmplayer.*|.*sopcast.*|.*drweb-cureit.*"
 }
  }
   ]
}
 }
  }
   }
}

This query took a lot of time. Logs were showing that the GC would kicking
in after every 3-5 seconds. And finally the query fails with an OOM
exception. I have been trying to understand what's the reason for this
query to make OOM happen. After OOM, the ES node just becomes unresponsive
until the GC is actually able to clear up some m.emory. This is the exact
exception I get in the logs: http://pastebin.mozilla.org/6975835.

In the above case, I understand the regex is not optimized for
Elasticsearch's (or rather Lucene's) regex engine. But an unoptimized regex
requires a lot of memory? I don't quite understand that.

I don't know what's causing this and I really need to understand how Regexp
Queries work in Elasticsearch and how they work in Lucene.

Vaidik Kapoor
vaidikkapoor.info

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACWtv5nqQr-RqLeSp4t1KBaojByff8_nnpi38V-zhSod

Search Time Query Expansion

2014-10-28 Thread Yingkai Gao
I want to implement a classic query expansion (Rocchio method for example) for 
ElasticSearch.

Basically what I need to do is to retrieve the term vectors of the top N 
relevant documents given an original query, extract a list of terms based 
on some criteria, and finally get enriched document results with a expanded 
query.  Surely I can do that by searching twice, but is there any better 
solution that I can do it *IN* the ElasticSearch?  Do you have any 
suggestions about where and how I can implement it?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4f202647-5a2c-4be8-a731-f485589ef7ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: POTENTIAL BUG? LogStash drops first few events when monitoring multiple files.

2014-10-28 Thread jiangdengc

I'm encountered the same issue,so how do you solve it?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/795340f9-18a7-4b2b-a1fa-2680c74be19f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana 4 Template/RISON URL's

2014-10-28 Thread Greg Zapp
Hello,

I've been demoing Kibana 4 and will probably be offering some more feedback 
over the next few days, but wanted to touch on the template 
replacement(RISON URL's).  I had a think about this and it seems a bit 
insufficient for how I currently utilize templates.  I often want to link 
to a template from multiple sources and pass in a few key parameters, such 
as cluster number and name.  After that I can tweak, modify and update the 
dashboard with the changes visible on the same link.  I see a few couple 
options with this:
1.) Redistribute the links(Yuck!)
2.) Redirect a stable endpoint to the updated URL, injecting the parameters 
into the RISON URL

Option one is a bit gross, and option two has no integration into Kibana 4 
itself currently :[  What is everyone else thinking?


-Greg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bb8b1816-72f9-4042-9393-a84bb26b3ce5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Upgrade to ES 1.4 memory issues / tuning max_merged_segment

2014-10-28 Thread joergpra...@gmail.com
There are several areas of memory Elasticsearch is using when receiving
large bulks over HTTP:

- Netty buffers (HTTP chunking etc.)

- bulk source (the lines are split into portions for each primary shard)

- memory for analyzing/tokenizing the fields in the source

- translog buffer (ES write ahead logging)

- indexing buffer (Lucene NRT etc.)

The longer the bulk runs, the more competition is for the 2g heap.

If you run sustaining bulk requests for some time (say 15 - 20 minutes), ES
picks up the created segments on disk and merges the segments to larger
ones to keep the performance.

Reducing the default of 5g to 1g for max_merge_segments has two effects. It
allows for faster completion of a merge step because the volume of merge
segment is limited, and it takes off some of the pressure on the heap when
segments grow larger and larger. The downside is that merge steps are
executed more frequently.

You are correct, bulk requests around 1-10MB should work ok for most of the
servers.

Bulk requests of 100MB and larger have strong effects on the run time and
the memory consumption for the other ES processing steps which are
necessary to index the data, and should be reduced in order to find a
"sweet spot" - the exact point of the optimal balance between bulk request
input and indexing power depends also on other factors, like I/O throughput
and CPU (plus ES settings like store throttling).

Jörg



On Tue, Oct 28, 2014 at 4:41 PM,  wrote:

> Hi all,
>
> I have been testing an upgrade to elasticsearch 1.4 beta1.
>
> We use the Bulk API along with scripts to perform upserts into
> elasticsearch.  These perform well under ES 1.2 without any tuning.
>
> However, in ES 1.4 beta1, running these upsert scripts often lead to:
>   java.lang.OutOfMemoryError: Java heap space
>
>
> We use the bulk API:
>
>   curl -iL -silent --show-error -XPOST 'localhost:9200/_bulk'
> --data-binary @./
>
>
> where the file contains about 130 Mb ( 10,000 to 250,000 lines ) of data.
> It is filled with update / script commands:
>
>
> {"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
> {"doc":
>
> {"type":"event","date_time":"2014-10-17T19:00:00Z","day":20141017,"impression_cost":0.005,"format":"xyz","impression":1,"referer":"xyz","browser":"xyz","os":"android
> 4.4.4","device":"nexus
> 4","channel":"mobile","x_name":"xyz","id":"97bc142e15c7136ebe866890e03dfad9"
>   },"doc_as_upsert":true
> }
>
>
>
> {"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
> {
>   "script":"if( ctx._source.containsKey(\"impression\") ){
> ctx._source.impression += 2; } else { ctx._source.impression = 2; };"
> }
>
>
>
> There were some issues with  with permgen taking up memory in this ticket
> that have been addressed since the beta1 release, so we re-built from the
> 1.4 branch:
> https://github.com/elasticsearch/elasticsearch/issues/7658
>
>
> And I found this discussion about an OOM error that suggested including
> the max_merged_segment in elasticsearch.yml.
>
> https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/max_merged_segment/elasticsearch/ETjvBVUvCJs/ZccfzUIFAKoJ
>
>   index.merge.policy.max_merged_segment: 1gb
>
>
> Setting max_merged_segment, launching on my development machine with a
> 2gb: ES_HEAP_SIZE=2g ./bin/elasticsearch, and bringing down the file size
> per-bulk request to about 25Mb stablilzed the system.
> However, it would still heap dump when larger files like 130Mb were
> allowed.
>
>
> I don't fully understand how this fixed the memory issues.  Would anyone
> be able to provide some insight into why we would run into memory issues
> with the upgrade?
> I'd like to better understand how the memory is managed here so that I can
> support this in production.  Are there recommended sizes for bulk
> requests?  And how those related to the max_merged_segment size?
>
>
> Thanks,
> Dave
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d5845815-eb21-41c0-b899-96626dce577e%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGqFR_QSBMKiynb%2BpbLKh-VvEoGzj8iJiHv5VL41QKZDA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Error Restoring Snapshot

2014-10-28 Thread Mike Tolman
Nevermind, turns out that I wasn't copying the full directory structure 
correctly when copying the repository files from Server1 to Server2.

On Tuesday, October 28, 2014 11:29:11 AM UTC-6, Mike Tolman wrote:
>
> Hi,
>
> I've been trying to restore an index snapshot and am getting this error in 
> the response:
>
> ElasticsearchParseException[unexpected token  [FIELD_NAME]]
>
> I'm sure I'm just doing something stupid, but I can't figure out what. 
> Does anyone have any idea what I might be doing wrong? 
>
> Here is my basic workflow:
>
> (using ES 1.3.2 -- Server1 and Server2 are separate ES clusters)
>
> 1. Create fs snapshot repository on Server1
> 2. Create snapshot of index 'x' on Server1
> 3. Create fs snapshot repository on Server2
> 4. Copy files from Server1 repository to Server2 repository
> 5. Close index on Server2
> 6. Restore snapshot on Server2
>
> Step 6 always fails for me with the "unexpected token" error.
>
> Thanks in advance,
> Mike
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2ba371f3-9546-4f0d-a432-bf6f6361bbd1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How is it calculated _score

2014-10-28 Thread Ivan Brusic
As predicted, your IDF values differ. Shard 1 has fewer documents than
shard 4 with that term make the shard 1 documents more relevant. You can
change the search type as described above.

{
   "value": 6.8087983,
   "description": "idf(docFreq=86, maxDocs=28990)"
},


{
   "value": 6.7636743,
   "description": "idf(docFreq=90, maxDocs=28985)"
},

-- 
Ivan

On Tue, Oct 28, 2014 at 4:44 PM, Manuel Sciuto  wrote:

> Request
> GET /business/actividades,alojamiento,comida,transporte_&_servicios/_search
> {
>   "explain": true,
>   "query": {
> "filtered": {
>   "query": {
> "match": {
>   "name": "Sheraton"
> }
>   }
> }
>   }
> }
>
> Response
>
> {
>"took": 106,
>"timed_out": false,
>"_shards": {
>   "total": 5,
>   "successful": 5,
>   "failed": 0
>},
>"hits": {
>   "total": 506,
>   "max_score": 6.8087983,
>   "hits": [
>  {
> "_shard": 1,
> "_node": "6MdK3cvjRQyqaaiUkxjIZQ",
> "_index": "business",
> "_type": "alojamiento",
> "_id": "273825",
> "_score": 6.8087983,
> "_source": {
>"name": "Sheraton",
>"reviews": 2
> },
> "_explanation": {
>"value": 6.8087983,
>"description": "weight(name:sheraton in 5198)
> [PerFieldSimilarity], result of:",
>"details": [
>   {
>  "value": 6.8087983,
>  "description": "fieldWeight in 5198, product of:",
>  "details": [
> {
>"value": 1,
>"description": "tf(freq=1.0), with freq of:",
>"details": [
>   {
>  "value": 1,
>  "description": "termFreq=1.0"
>   }
>]
> },
> {
>"value": 6.8087983,
>"description": "idf(docFreq=86, maxDocs=28990)"
> },
> {
>"value": 1,
>"description": "fieldNorm(doc=5198)"
> }
>  ]
>   }
>]
> }
>  },
>  {
> "_shard": 1,
> "_node": "6MdK3cvjRQyqaaiUkxjIZQ",
> "_index": "business",
> "_type": "alojamiento",
> "_id": "252355",
> "_score": 6.8087983,
> "_source": {
>"name": "Sheraton",
>"reviews": 1
> },
> "_explanation": {
>"value": 6.8087983,
>"description": "weight(name:sheraton in 19220)
> [PerFieldSimilarity], result of:",
>"details": [
>   {
>  "value": 6.8087983,
>  "description": "fieldWeight in 19220, product of:",
>  "details": [
> {
>"value": 1,
>"description": "tf(freq=1.0), with freq of:",
>"details": [
>   {
>  "value": 1,
>  "description": "termFreq=1.0"
>   }
>]
> },
> {
>"value": 6.8087983,
>"description": "idf(docFreq=86, maxDocs=28990)"
> },
> {
>"value": 1,
>"description": "fieldNorm(doc=19220)"
> }
>  ]
>   }
>]
> }
>  },
>  {
> "_shard": 1,
> "_node": "6MdK3cvjRQyqaaiUkxjIZQ",
> "_index": "business",
> "_type": "alojamiento",
> "_id": "132774",
> "_score": 6.8087983,
> "_source": {
>"name": "Sheraton",
>"reviews": 1
> },
> "_explanation": {
>"value": 6.8087983,
>"description": "weight(name:sheraton in 21640)
> [PerFieldSimilarity], result of:",
>"details": [
>   {
>  "value": 6.8087983,
>  "description": "fieldWeight in 21640, product of:",
>  "details": [
> {
>"value": 1,
>"description": "tf(freq=1.0), with freq of:",
>"details": [
>   

Re: How is it calculated _score

2014-10-28 Thread Manuel Sciuto
Request 
GET /business/actividades,alojamiento,comida,transporte_&_servicios/_search
{
  "explain": true,
  "query": {
"filtered": {
  "query": {
"match": {
  "name": "Sheraton"
}
  }
}
  }
}

Response

{
   "took": 106,
   "timed_out": false,
   "_shards": {
  "total": 5,
  "successful": 5,
  "failed": 0
   },
   "hits": {
  "total": 506,
  "max_score": 6.8087983,
  "hits": [
 {
"_shard": 1,
"_node": "6MdK3cvjRQyqaaiUkxjIZQ",
"_index": "business",
"_type": "alojamiento",
"_id": "273825",
"_score": 6.8087983,
"_source": {
   "name": "Sheraton",
   "reviews": 2
},
"_explanation": {
   "value": 6.8087983,
   "description": "weight(name:sheraton in 5198) 
[PerFieldSimilarity], result of:",
   "details": [
  {
 "value": 6.8087983,
 "description": "fieldWeight in 5198, product of:",
 "details": [
{
   "value": 1,
   "description": "tf(freq=1.0), with freq of:",
   "details": [
  {
 "value": 1,
 "description": "termFreq=1.0"
  }
   ]
},
{
   "value": 6.8087983,
   "description": "idf(docFreq=86, maxDocs=28990)"
},
{
   "value": 1,
   "description": "fieldNorm(doc=5198)"
}
 ]
  }
   ]
}
 },
 {
"_shard": 1,
"_node": "6MdK3cvjRQyqaaiUkxjIZQ",
"_index": "business",
"_type": "alojamiento",
"_id": "252355",
"_score": 6.8087983,
"_source": {
   "name": "Sheraton",
   "reviews": 1
},
"_explanation": {
   "value": 6.8087983,
   "description": "weight(name:sheraton in 19220) 
[PerFieldSimilarity], result of:",
   "details": [
  {
 "value": 6.8087983,
 "description": "fieldWeight in 19220, product of:",
 "details": [
{
   "value": 1,
   "description": "tf(freq=1.0), with freq of:",
   "details": [
  {
 "value": 1,
 "description": "termFreq=1.0"
  }
   ]
},
{
   "value": 6.8087983,
   "description": "idf(docFreq=86, maxDocs=28990)"
},
{
   "value": 1,
   "description": "fieldNorm(doc=19220)"
}
 ]
  }
   ]
}
 },
 {
"_shard": 1,
"_node": "6MdK3cvjRQyqaaiUkxjIZQ",
"_index": "business",
"_type": "alojamiento",
"_id": "132774",
"_score": 6.8087983,
"_source": {
   "name": "Sheraton",
   "reviews": 1
},
"_explanation": {
   "value": 6.8087983,
   "description": "weight(name:sheraton in 21640) 
[PerFieldSimilarity], result of:",
   "details": [
  {
 "value": 6.8087983,
 "description": "fieldWeight in 21640, product of:",
 "details": [
{
   "value": 1,
   "description": "tf(freq=1.0), with freq of:",
   "details": [
  {
 "value": 1,
 "description": "termFreq=1.0"
  }
   ]
},
{
   "value": 6.8087983,
   "description": "idf(docFreq=86, maxDocs=28990)"
},
{
   "value": 1,
   "description": "fieldNorm(doc=21640)"
}
 ]
  }
   ]
}
 },
 {
"_shard": 1,
"_node": "

Re: How is it calculated _score

2014-10-28 Thread Ivan Brusic
The default scoring algorithm is based on TF-IDF.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/practical-scoring-function.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scoring-theory.html

You can enable explain to see how documents are scored:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-explain.html

Without knowing more about your system, I suspect it is the IDF that is
causing the mismatch. The IDF is calculated per shard, so if when your
documents come from different shards, the scores can be different. Try
using a distributed search type (dfs_query_then_fetch) to see if the issue
still persists:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_search_options.html#search-type

Cheers,

Ivan

On Tue, Oct 28, 2014 at 3:38 PM, Manuel Sciuto  wrote:

> Hello
>
> How is score calculated?
>
> GET /business/actividades,alojamiento,comida,transporte_&_servicios/_search
> {
>   "query": {
> "filtered": {
>   "query": {
> "match": {
>   "name": "Sheraton"
> }
>   }
> }
>   }
> }
>
> Response
>
> {
>"took": 4,
>"timed_out": false,
>"_shards": {
>   "total": 5,
>   "successful": 5,
>   "failed": 0
>},
>"hits": {
>   "total": 506,
>   "max_score": 6.8087983,
>   "hits": [
>  {
> "_index": "business",
> "_type": "alojamiento",
> "_id": "273825",
> "_score": 6.8087983,
> "_source": {
>"name": "Sheraton",
>"reviews": 2
> }
>  },
>  {
> "_index": "business",
> "_type": "alojamiento",
> "_id": "252355",
> "_score": 6.8087983,
> "_source": {
>"name": "Sheraton",
>"reviews": 1
> }
>  },
>  {
> "_index": "business",
> "_type": "alojamiento",
> "_id": "132774",
> "_score": 6.8087983,
> "_source": {
>"name": "Sheraton",
>"reviews": 1
> }
>  },
>  {
> "_index": "business",
> "_type": "alojamiento",
> "_id": "225509",
> "_score": 6.8087983,
> "_source": {
>"name": "Sheraton",
>"reviews": 2
> }
>  },
>  {
> "_index": "business",
> "_type": "alojamiento",
> "_id": "232124",
> "_score": 6.8087983,
> "_source": {
>"name": "Sheraton",
>"reviews": 1
> }
>  },
>  {
> "_index": "business",
> "_type": "alojamiento",
> "_id": "219172",
>* "_score": 6.8087983,*
> "_source": {
>"name": "Sheraton",
>"reviews": 0
> }
>  },
>  {
> "_index": "business",
> "_type": "alojamiento",
> "_id": "224180",
>   *  "_score": 6.7636743,*
> "_source": {
>"name": "Sheraton",
>"reviews": 3
> }
>  },
>  {
> "_index": "business",
> "_type": "alojamiento",
> "_id": "268979",
> "_score": 6.7636743,
> "_source": {
>"name": "Sheraton",
>"reviews": 12
> }
>  },
>  {
> "_index": "business",
> "_type": "alojamiento",
> "_id": "228353",
> "_score": 6.7636743,
> "_source": {
>"name": "Sheraton",
>"reviews": 2
> }
>  },
>  {
> "_index": "business",
> "_type": "alojamiento",
> "_id": "112508",
> "_score": 6.7636743,
> "_source": {
>"name": "Sheraton",
>"reviews": 9
> }
>  }
>   ]
>}
> }
>
> Because the score is different in some cases? If the name is the same
>
> Thanks!!
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/bcaf4e31-f64a-4cc7-8b2f-986212216b9c%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop 

How is it calculated _score

2014-10-28 Thread Manuel Sciuto
Hello

How is score calculated? 

GET /business/actividades,alojamiento,comida,transporte_&_servicios/_search
{
  "query": {
"filtered": {
  "query": {
"match": {
  "name": "Sheraton"
}
  }
}
  }
}

Response

{
   "took": 4,
   "timed_out": false,
   "_shards": {
  "total": 5,
  "successful": 5,
  "failed": 0
   },
   "hits": {
  "total": 506,
  "max_score": 6.8087983,
  "hits": [
 {
"_index": "business",
"_type": "alojamiento",
"_id": "273825",
"_score": 6.8087983,
"_source": {
   "name": "Sheraton",
   "reviews": 2
}
 },
 {
"_index": "business",
"_type": "alojamiento",
"_id": "252355",
"_score": 6.8087983,
"_source": {
   "name": "Sheraton",
   "reviews": 1
}
 },
 {
"_index": "business",
"_type": "alojamiento",
"_id": "132774",
"_score": 6.8087983,
"_source": {
   "name": "Sheraton",
   "reviews": 1
}
 },
 {
"_index": "business",
"_type": "alojamiento",
"_id": "225509",
"_score": 6.8087983,
"_source": {
   "name": "Sheraton",
   "reviews": 2
}
 },
 {
"_index": "business",
"_type": "alojamiento",
"_id": "232124",
"_score": 6.8087983,
"_source": {
   "name": "Sheraton",
   "reviews": 1
}
 },
 {
"_index": "business",
"_type": "alojamiento",
"_id": "219172",
   * "_score": 6.8087983,*
"_source": {
   "name": "Sheraton",
   "reviews": 0
}
 },
 {
"_index": "business",
"_type": "alojamiento",
"_id": "224180",
  *  "_score": 6.7636743,*
"_source": {
   "name": "Sheraton",
   "reviews": 3
}
 },
 {
"_index": "business",
"_type": "alojamiento",
"_id": "268979",
"_score": 6.7636743,
"_source": {
   "name": "Sheraton",
   "reviews": 12
}
 },
 {
"_index": "business",
"_type": "alojamiento",
"_id": "228353",
"_score": 6.7636743,
"_source": {
   "name": "Sheraton",
   "reviews": 2
}
 },
 {
"_index": "business",
"_type": "alojamiento",
"_id": "112508",
"_score": 6.7636743,
"_source": {
   "name": "Sheraton",
   "reviews": 9
}
 }
  ]
   }
}

Because the score is different in some cases? If the name is the same

Thanks!!


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bcaf4e31-f64a-4cc7-8b2f-986212216b9c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: hardware recommendation for dedicated client node

2014-10-28 Thread Terence Tung
can anyone suggest the hardware recommendation for the dedicated client 
node? thanks.


On Friday, October 24, 2014 6:32:26 PM UTC-7, Terence Tung wrote:
>
> hi there,
>
> i wonder what is the hardware recommendation for the dedicated client 
> node? i know master is a very light weight node that doesn't require good 
> hardware, but how about client? it's saying the client node is doing the 
> actual gather processing, so i assume it might require more memory like 
> data node. am i right? any recommendation would be greatly appreciated.
>
> thanks,
> TT
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/926816d7-6cf2-4f04-a7a3-e554d59d44ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Returning the main document and the tag that matched in a sub document

2014-10-28 Thread Mike Maddox
Any help would be appreciated. I have a parent document that has an array 
of sub documents which are tags that are associated with an objectid which 
is used to identify another element. I am able to search on the tags and 
get a response which returns the parent document which is exactly what I 
want. However, since the tags map to an objectid, I'd like to know which 
keywords matched the tag so I can get the objectid. I can compare the tags 
on the client to figure out which matched, however, if using a stemming 
analyzer, this wouldn't work and I'd like to find a better way if possible. 
For example, if I search for "friend and families" I would get a return of 
the document with id 249184, but I want to find out that we matched a tag 
related to objectid='7'. Any suggestions on if I'm going in the right 
direction to get the results I need or would there be another way to 
structure this. 


{
  "_index": "myindex",
  "_type": "mytype",
  "_id": "249184",
  "_version": 1,
  "_score": 1,
  "_source": {
"id": 249184,
"info":"I love elasticsearch",
"mytags": [
  {
"objectid": 7,
"tags": [
  "friend and families",
  "brother"
]
  },
  {
"objectid": 3,
"tags": [
  "sister"
]
  }
]
  }
}


The index is defined as follows (has been simplified for this example):

{
"mydata": {
"properties": {
"id": {
"type": "integer",
"index": "not_analyzed"
},
"info": {
"type": "string",
"analyzer": "standard"
},
"mytags": {
"type" : "object",
"properties": {
"objectid": { "type": "integer", "index": 
"not_analyzed" },
"tags":  { "type": "string", "analyzer": "standard"}
}
}
}
}
}

Thanks much


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/00fb58c8-dee5-47d5-a463-e754d28d7b33%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch rpm creates file /usr/lib/sysctl.d/elasticsearch.conf, but doesn't apply it, so vm.max_map_count remains the default until reboot

2014-10-28 Thread Suny
Wait. I'd say it doesn't require a reboot in RHEL6, it's a tunable 
parameter because it sits in /proc/sys/vm/max_map_count. In fact, it is 
activated at service start in the init script
sysctl -q -w vm.max_map_count=$MAX_MAP_COUNT
So I'd say things are fine once the service has started. That 
/usr/lib/sysctl.d/elasticsearch.conf  doesn't mean a thing on RHEL6, the 
MAX_MAP_COUNT in /etc/sysconfig/elasticsearch counts. I apologize for the 
confusion. It was on a customer's server, and i wasn't aware that 
elasticsearch wasn't running when he queried the system parameters. For my 
part, learned sth again. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ba7bece3-8b61-474a-9e42-b6d5e7e8a216%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Error Restoring Snapshot

2014-10-28 Thread Mike Tolman
Hi,

I've been trying to restore an index snapshot and am getting this error in 
the response:

ElasticsearchParseException[unexpected token  [FIELD_NAME]]

I'm sure I'm just doing something stupid, but I can't figure out what. Does 
anyone have any idea what I might be doing wrong? 

Here is my basic workflow:

(using ES 1.3.2 -- Server1 and Server2 are separate ES clusters)

1. Create fs snapshot repository on Server1
2. Create snapshot of index 'x' on Server1
3. Create fs snapshot repository on Server2
4. Copy files from Server1 repository to Server2 repository
5. Close index on Server2
6. Restore snapshot on Server2

Step 6 always fails for me with the "unexpected token" error.

Thanks in advance,
Mike

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8edd8603-0387-4d15-af36-30965b89ee84%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


term complexity and filter caching

2014-10-28 Thread smonasco
Hi peeps,

Leaf level filter cache is nice, but caching complex filters is better if 
you can get some hit ratio.

We have some items (querystrings that may become bool filters later) with 
over 50 terms (sometimes hundreds) that get reused often.  We essentially 
aggregate financial research for hundreds of sites, each of which have only 
paid for certain data and this is the primary case for these complex and 
highly reused queries.

However, we are looking at adding in some leaf level filter caching on 
other items.  Say an industry or a report type.  Across fields with small 
permutation and high re-use.

Elasticsearch employs an LRU policy on filter cache.

However, the complex queries with high term complexity and lots of "AND"s 
and "OR"s saves far more time, CPU and memory per use than the leaf level 
caching on something like "reporttype:research" ,  so I'd like to add some 
weight to the more complex queries when determining what gets evicted.

Thoughts?

--Shannon Monasco

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e51fafea-d4f3-4bc3-823e-26f9e40dfe89%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana: Deploy without Ruby / Warbler / Gems / Java etc

2014-10-28 Thread Nick Zadrozny
As an Elasticsearch hosting provider, I share Ben’s questions here.




We currently host Kibana 3 for our customers alongside their hosted ES 
clusters. Our customers find this to be useful, since we’re able to offer a 
zero-deploy, zero-configuration experience that already handles all the CORS 
and authentication and such.




We’d like to continue providing that, and my current plan is to do a bit of 
work with Kibana 4 to extract the static assets, but it would be nice to get 
some official word from ES, Inc on the expected future trajectory for Kibana’s 
deployment story. A bit of consideration for static deploys would be really 
preferable to maintaining a de facto fork.




(Thanks for all the hard work, etc; I’m looking forward to getting Kibana 4 out 
the door on Bonsai!)

On Mon, Oct 27, 2014 at 11:54 PM, Ben Walding 
wrote:

> In Kibana 3, it was possible to deploy Kibana as a set of static files.
> In Kibana 4, the default mode is to deploy using a full stack of Ruby code 
> on top of Warbler and JRuby etc.
> It seems from my experimentation that this is not really required beyond
>- automatically populating the /config endpoint with some configuration 
>settings
>- creating an Elasticsearch proxy (presumably to avoid dealing with CORS 
>and XSS limitations)
>- enumerating plugins
> (for reference I got Kibana 4 working in a static deployment by fixing a 
> few pathing issues and adding CORS support to ES)
> While having a preconfigured stack that launches from a single command is 
> awesome, it also bloats the deployment requirement for more sophisticated 
> deployment.
> What I'd like to know is if the general development direction is that more 
> dynamic configuration will be done in the Ruby code - creating an ever 
> increasing requirement on the Ruby stack in the future?
> e.g. Generating a plugin manifest could easily be done using a bash / js 
> script - it does not need a full Ruby stack.
> Thanks,
> Ben
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/f316bdf1-01d4-4e20-8037-a36baf650263%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1414514591571.35f10e72%40Nodemailer.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch rpm creates file /usr/lib/sysctl.d/elasticsearch.conf, but doesn't apply it, so vm.max_map_count remains the default until reboot

2014-10-28 Thread Suny
Unfortunately, the rpms puts its file into the wrong directory. In RHEL, 
the sysctl reads /etc/sysctl.d, not /usr/lib/sysctl.d/. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8c69f6e1-2b23-474a-957a-3cc7bfbd1d55%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch timeout exceptions

2014-10-28 Thread aldm
Hi,

I'm using elasticsearch for storing some documents. Index has about 7000 
docs and 5 shards. Timeout on all requests in set to 5 secs and I'm
getting ElasticsearchTimeoutException really often. We have lots of 
indexing in systems thorugh more processes and probably this is what make 
overhead on cluster.

I'm using this in jruby rails app which uses Java API client for accessing 
elastcsearch. Approach is to use client node and doing operations on this 
node.

So does anyone have experience with timeouting elasticsearch requests ?
What could be the main reason for this and how could I handle this ? 
Increasing timeout is not an option. 

Thanks in advance

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/18e7ba2c-4df8-4be7-87a6-6ee4e6e7f8a2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch rpm creates file /usr/lib/sysctl.d/elasticsearch.conf, but doesn't apply it, so vm.max_map_count remains the default until reboot

2014-10-28 Thread Suny
Thanks Jörg, your answer is really helpful. I didn't know of the connection 
between mlockall and max_map_count. I agree that the system parameters are 
the sys ads' job. Still, some documentation on this setting from 
elasticsearch.org would help, like in which cases it's important, as you 
explained. BTW, i'm using your post 
http://jprante.github.io/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html
 
a lot. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ac09e37f-d603-46ca-a499-7fa542e86b06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Upgrade to ES 1.4 memory issues / tuning max_merged_segment

2014-10-28 Thread dzaebst
Hi all,

I have been testing an upgrade to elasticsearch 1.4 beta1.

We use the Bulk API along with scripts to perform upserts into 
elasticsearch.  These perform well under ES 1.2 without any tuning.

However, in ES 1.4 beta1, running these upsert scripts often lead to: 
  java.lang.OutOfMemoryError: Java heap space


We use the bulk API:

  curl -iL -silent --show-error -XPOST 'localhost:9200/_bulk' --data-binary 
@./


where the file contains about 130 Mb ( 10,000 to 250,000 lines ) of data. 
 It is filled with update / script commands:

{"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
{"doc":
  
{"type":"event","date_time":"2014-10-17T19:00:00Z","day":20141017,"impression_cost":0.005,"format":"xyz","impression":1,"referer":"xyz","browser":"xyz","os":"android
 
4.4.4","device":"nexus 
4","channel":"mobile","x_name":"xyz","id":"97bc142e15c7136ebe866890e03dfad9"
  },"doc_as_upsert":true
}


{"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
{
  "script":"if( ctx._source.containsKey(\"impression\") ){ 
ctx._source.impression += 2; } else { ctx._source.impression = 2; };"
}



There were some issues with  with permgen taking up memory in this ticket 
that have been addressed since the beta1 release, so we re-built from the 
1.4 branch:
https://github.com/elasticsearch/elasticsearch/issues/7658


And I found this discussion about an OOM error that suggested including the 
max_merged_segment in elasticsearch.yml.
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/max_merged_segment/elasticsearch/ETjvBVUvCJs/ZccfzUIFAKoJ

  index.merge.policy.max_merged_segment: 1gb


Setting max_merged_segment, launching on my development machine with a 2gb: 
ES_HEAP_SIZE=2g ./bin/elasticsearch, and bringing down the file size 
per-bulk request to about 25Mb stablilzed the system.
However, it would still heap dump when larger files like 130Mb were allowed.


I don't fully understand how this fixed the memory issues.  Would anyone be 
able to provide some insight into why we would run into memory issues with 
the upgrade?
I'd like to better understand how the memory is managed here so that I can 
support this in production.  Are there recommended sizes for bulk requests? 
 And how those related to the max_merged_segment size?


Thanks,
Dave

 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d5845815-eb21-41c0-b899-96626dce577e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch rpm creates file /usr/lib/sysctl.d/elasticsearch.conf, but doesn't apply it, so vm.max_map_count remains the default until reboot

2014-10-28 Thread joergpra...@gmail.com
This is a kernel parameter and requires a reboot.

It is bad habit to enforce kernel parameter change from an application.
This is due to the administrator.

Note that increasing vm.max_map_count is only required for applications
that use many small mmap calls (with mlockall, Elasticsearch does not). A
number which is too high will potentially increase memory consumption on
the server and reduce performance.

Jörg

On Tue, Oct 28, 2014 at 3:53 PM, Suny  wrote:

> Hi. We checked our elasticsearch and OS settings, and found that the
> system parameter vm.max_map_count was too low. It was still the linux
> default, 65530, but should be increased to 262144. This value is set in
> /usr/lib/sysctl.d/elasticsearch.conf (and also in
> /etc/sysconfig/elasticsearch, MAX_MAP_COUNT=262144). But we have to call
> "sysctl -p /usr/lib/sysctl.d/elasticsearch.conf" for this to take effect,
> or reboot the server. I don't know if anything speaks against doing stuff
> like this in the rpm's install scripts. If it's OK, please add it. This was
> the behaviour on RHEL6 and CentOS6 with the
> rpms elasticsearch.noarch 1.2.1-1 and probably also 1.0.1.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/521c23f0-5f2c-4479-b224-34f4542f1291%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHmyO3QJwakrDMH950pbDevsdbGdNQ86Zzf0k6Ysa9SLQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch rpm creates file /usr/lib/sysctl.d/elasticsearch.conf, but doesn't apply it, so vm.max_map_count remains the default until reboot

2014-10-28 Thread Suny
Hi. We checked our elasticsearch and OS settings, and found that the system 
parameter vm.max_map_count was too low. It was still the linux 
default, 65530, but should be increased to 262144. This value is set in 
/usr/lib/sysctl.d/elasticsearch.conf (and also in 
/etc/sysconfig/elasticsearch, MAX_MAP_COUNT=262144). But we have to call 
"sysctl -p /usr/lib/sysctl.d/elasticsearch.conf" for this to take effect, 
or reboot the server. I don't know if anything speaks against doing stuff 
like this in the rpm's install scripts. If it's OK, please add it. This was 
the behaviour on RHEL6 and CentOS6 with the 
rpms elasticsearch.noarch 1.2.1-1 and probably also 1.0.1. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/521c23f0-5f2c-4479-b224-34f4542f1291%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Connecting to ES via a http proxy in perl client

2014-10-28 Thread Kevin Van Workum


On Tuesday, October 28, 2014 5:25:29 AM UTC-4, Clinton Gormley wrote:
>
> Hi Kevin
>
> On Friday, 24 October 2014 18:24:00 UTC+2, Kevin Van Workum wrote:
>>
>> I'm trying to connect to my ES via a proxy using a client written in 
>> perl. What's the best way to do this?
>>
>> Here's what I have, and it works, but I suspect there's a more straight 
>> forward approach:
>>
>> $e = Search::Elasticsearch->new(
>>cxn => 'LWP',
>>nodes => 'node1:9200' );
>>
>> $ENV{HTTP_proxy} = "http://proxy:3128";;
>> $e->transport->cxn_pool->next_cxn->handle->env_proxy;
>>
>>
> You should be able to do this using the default Cxn backend (HTTP::Tiny). 
>  I haven't tried proxies but, according to the HTTP::Tiny docs, proxies are 
> supported: https://metacpan.org/pod/HTTP::Tiny#PROXY-SUPPORT
>
> This should work:
>
>  
>
>> $ENV{http_proxy} = "http://proxy:3128";;
>> $e = Search::Elasticsearch->new( nodes => 'node1:9200' );
>>  
>>
>
Yep, that works. Thanks. One can also do this:

$e = Search::Elasticsearch->new ( nodes => 'node1:9200', handle_args => { 
proxy => 'http://proxy:3128' } );

 

-- 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0fed1321-7c27-4938-ad80-9a840b7b6fef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Return Levenshtein distance in fuzzy query

2014-10-28 Thread Octavian
Hy,

I want to return to closest phrase to a phrase stored in ElasticSearch, in 
terms of Levenshtein distance. Both phrases can contain several words (up 
to 100).

My query looks like:
{
  "query": {
  "bool":
  {
  must: [{"fuzzy_like_this": {
  "fields": [
"stand_for"
  ],
  "like_text": queryPhrase,
  "ignore_tf": true,
  "fuzziness": 0.7,
}}
]
  }
  },
  "min_score":2.0
};

and the mapping looks like:
{
"stand_for": {
   "type": "string"
}


The problem is that the score returned is a relevance function of the terms 
that match the fuziness parameter and I need to return only the score based 
on Levenshtein distance. Is there any way to do this?

Thank you,

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bca80d78-8146-4f2a-b4e3-4cef2b731929%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: High cpu load but low memory usage.

2014-10-28 Thread joergpra...@gmail.com
Did your search queries change recently?

You have some options

- optimize indices to reduce segments, therefore faster search

- optimize queries, use filter/constant score instead of query/score

- use caching for filtered queries if you have queries that repeat

Jörg



On Tue, Oct 28, 2014 at 11:02 AM, Anh Huy Do  wrote:

> Hi Jorg,
>
> This is hot thread info when CPU get high :
>
> ::: [Search-195][W2LL0dnBSGu_5k7fAHt0uA][inet[/195:9300]]{master=true}
>
>
>
>   28.1% (140.4ms out of 500ms) cpu usage by thread
> 'elasticsearch[Search-195][search][T#22]'
>
> 10/10 snapshots sharing following 10 elements
>
>   sun.misc.Unsafe.park(Native Method)
>
>   java.util.concurrent.locks.LockSupport.park(Unknown Source)
>
>
>   
> org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
>
>
>   
> org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
>
>
>   
> org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
>
>
>   
> org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
>
>   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
>
>   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>
>   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>
>   java.lang.Thread.run(Unknown Source)
>
>
>
>   24.7% (123.4ms out of 500ms) cpu usage by thread
> 'elasticsearch[Search-195][search][T#9]'
>
> 10/10 snapshots sharing following 10 elements
>
>   sun.misc.Unsafe.park(Native Method)
>
>   java.util.concurrent.locks.LockSupport.park(Unknown Source)
>
>
>   
> org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
>
>
>   
> org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
>
>
>   
> org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
>
>
>   
> org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
>
>   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
>
>   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>
>   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>
>   java.lang.Thread.run(Unknown Source)
>
>
>
>   24.0% (119.8ms out of 500ms) cpu usage by thread
> 'elasticsearch[Search-195][search][T#19]'
>
> 7/10 snapshots sharing following 10 elements
>
>   sun.misc.Unsafe.park(Native Method)
>
>   java.util.concurrent.locks.LockSupport.park(Unknown Source)
>
>
>   
> org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
>
>
>   
> org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
>
>
>   
> org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
>
>
>   
> org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
>
>   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
>
>   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>
>   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>
>   java.lang.Thread.run(Unknown Source)
>
> 3/10 snapshots sharing following 23 elements
>
>
>   
> org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)
>
>
>   
> org.elasticsearch.index.search.child.ConstantScorer.nextDoc(ConstantScorer.java:48)
>
>
>   
> org.elasticsearch.common.lucene.docset.DocIdSets.toCacheable(DocIdSets.java:94)
>
>
>   
> org.elasticsearch.index.search.child.CustomQueryWrappingFilter.getDocIdSet(CustomQueryWrappingFilter.java:73)
>
>
>   
> org.elasticsearch.common.lucene.search.AndFilter.getDocIdSet(AndFilter.java:54)
>
>
>   
> org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(ApplyAcceptedDocsFilter.java:45)
>
>
>   org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:128)
>
>
>   
> org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:533)
>
>
>   org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:133)
>
>   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
>
>
>   
> org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)
>
>   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:581)
>
>   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:533)
>
>   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:510)
>
>   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.j

Re: Nested Documents - how to search for nested elements, not whole documents

2014-10-28 Thread Ivan Brusic
This behavior is currently not possible. The Elasticsearch team is working
on a solution.

-- 
Ivan
On Oct 28, 2014 5:24 AM,  wrote:

> Hi,
>
> I have hierarchical data: orders consist of several order items.
>
> How can I search for a certain orderItem without getting returned the
> other items of the same order?
> I tried to use nested documents, but I always receive the whole order and
> not the requested order item.
>
> Example:
>
> POST /salesorder7
> {
> "mappings": {
> "complete": {
> "properties": {
> "orderDetails": {
> "type": "nested",
> "properties": {
> "lineItemNumber": {
> "type": "string"
> },
> "productInformation": {
> "properties": {
> "id": {
> "type": "string"
> },
> "name": {
> "type": "string"
> }
> }
> },
> "totalLineItemPrice": {
> "properties": {
> "amount": {
> "type": "float"
> },
> "currency": {
> "type": "string"
> }
> }
> },
> "orderHeader": {
> "properties": {
> "orderNumber": {
> "type": "string"
> }
> }
> }
> }
> }
> }
> }
> }
> }
>
> PUT /salesorder7/complete/1
> {
> "orderHeader": {
> "orderNumber": "1"
> },
> "orderDetails": [
> {
> "lineItemNumber": "11",
> "productInformation": {
> "name": "product1",
> "id": "p1"
> },
> "totalLineItemPrice": {
> "amount": "105.04",
> "currency": "EUR"
> }
> },
> {
> "totalLineItemPrice": {
> "amount": "9.99",
> "currency": "EUR"
> },
> "lineItemNumber": "12",
> "productInformation": {
> "name": "product2",
> "id": "p2"
> }
> }
> ]
> }
>
> POST /salesorder7/complete/_search
> {
> "fields": [
> "orderHeader.orderNumber",
> "orderDetails.unitPrice.amount",
> "orderDetails.productInformation.name"
> ],
> "query": {
> "nested": {
> "path": "orderDetails",
> "score_mode": "avg",
> "query": {
> "bool": {
> "must": [
> {
> "match": {
> "orderDetails.productInformation.name":
> "product1"
> }
> }
> ]
> }
> }
> }
> }
> }
>
> This returns
>
> {
> "took": 7,
> "timed_out": false,
> "_shards": {
> "total": 5,
> "successful": 5,
> "failed": 0
> },
> "hits": {
> "total": 1,
> "max_score": 1.4054651,
> "hits": [
> {
> "_index": "salesorder7",
> "_type": "complete",
> "_id": "1",
> "_score": 1.4054651,
> "fields": {
> "orderDetails.productInformation.name": [
> "product1",
> "product2"
> ],
> "orderHeader.orderNumber": [
> "1"
> ]
> }
> }
> ]
> }
> }
>
> I want it to return product1 only, and not product1 and product2.
>
> How can I achive this?
>
> Best regards
> Henrik
>
> P.S. Sorry for not using CURL, under Windows it seems to only accept
> single line nonspaced documents or documents in files.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4eaf18d4-2db3-4264-8dfc-767ca34d5f28%40googlegroups.com
> 

Re: Data loss after network failure

2014-10-28 Thread joergpra...@gmail.com
With two nodes, you are exposed to split brain issues, as long as you do
not set minimum_master_nodes = 2.

It is recommended to use an odd number of nodes and at least 3 nodes with
minimum_master_nodes = 2 to avoid split brain.

Jörg

On Tue, Oct 28, 2014 at 12:24 PM, Frank Evers 
wrote:

> Hi all,
>
> We noticed unwanted synchronisation behaviour using two nodes with
> elasticsearch-1.3.4. The nodes got disconnected from each other after a
> network failure and both started running their own cluster. One node got
> new data pushed to it, the other didn't. After restarting the node that got
> the most recent data, it joined up with the other one. This resulted in the
> loss of all the new data from the master node since the network failure.
>
> Is this expected behaviour? If so, how can we prevent this scenario from
> occurring again?
>
> Thanks,
>
> Frank
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/3bd69bfd-001c-4c62-bfde-616c1f1695c8%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEDS%3DZzerCHoLU%3DcsadamujEQtutcL4rd7d6t%3DR-vUOvg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch get top words of each document by custom function

2014-10-28 Thread valerij . vasilcenko
 

I have a list of common words in English language

Each word has a rating

i need to get top 10 words of each document sorted by highest result of 
wordCount * wordRatio

I think this should be done at indexing 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/33d0be41-db01-4581-8f58-852a78098413%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: I'm getting exceptions while searching using cirrussearch from Mediawiki

2014-10-28 Thread Nikolas Everett
You aren't on IRC so I'll just explain here.  There isn't a mailing list
for Cirrus or I'd have us take it there.  The elasticsearch mailing list is
just as good as the Mediawiki mailing list.

Cirrus doesn't use stored scripts on the server because we at WMF update
Cirrus constantly and we don't want to have to think about whether or not a
release needs a script change.  And we do rolling releases so we'd have to
keep two copies of the script there all the time.  Just not worth it.

The other problem is that Cirrus releases are tagged on MediaWiki's twice
yearly release schedule for external use.  Thats honestly not frequent
enough given how quickly the project is moving and how quickly
Elasticsearch is moving.  WMF releases weekly and that works well but we
can't expect other users to do that.  Its just too much work.

Long story short - the _fastest_ way for you to resolve this is to reenable
dynamic scripting setting

script.disable_dynamic: false


in elasticsearch.yml and restart Elasticsearch.

Another solution would be for me to backport the latest release of cirrus
to mediawiki 1.23.  That'd take some time but wouldn't require
non-sandboxed dynamic scripting.

On Tue, Oct 28, 2014 at 8:18 AM, Nikolas Everett  wrote:

> I'll hop on irc and help from there. Depending on the version of cirrus
> you use it requires groovy or MVEL support.
> On Oct 28, 2014 4:12 AM, "Isabel Drost-Fromm" <
> isabel.drostfr...@elasticsearch.com> wrote:
>
>> This looks like a configuration issue. The clue is in the following line:
>>
>> "QueryParsingException[[wikidb_content_first] script_score the script
>> could not be loaded]; nested: ScriptException[dynamic scripting for[mvel]
>>  disabled];"
>>
>> According to
>>
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html
>>
>> scripting was disabled by default starting version 1.2.0.
>> Isabel
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAFSgB-BSB4HJSwgku_6Hco%3DdpzVxgpz77SVWerrEitSQdvY6SQ%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2n8S4dPvscAVCYUJEM6_fSRdWOprhRcJCehOsDBzvznA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


logstash IMAP issues

2014-10-28 Thread Alan Julian
I seem to be unable to get the logstash IMAP plugin working.

I have a logstash 1.4/elasticsearch/kibana setup which is working well. all 
pretty standard stuff.

However when i create an input on my logstash host to go off and pull back 
mail info using the imap plugin.
I cant seem to get it to work.

Below is my input configuration

input {
   imap {
type => "mail"
check_interval => 300
content_type => "text/plain"
delete => "false"
host => ""
port => 993
password => ""
secure => "false"
user => ""
  }
}

no matter what i do i get the following error.

{:timestamp=>"2014-10-28T12:25:31.802000+", :message=>"A plugin had an 
unrecoverable error. Will restart this plugin.\n  Plugin: 
\"mail\", content_type=>\"text/plain\", 
host=>\"10.0.0.0\", user=>\"-l...@mailserver.com\">\n  Error: Connection 
refused - Connection refused", :level=>:error}

ive checked the network and packets are going back and forth fine.

My question is.. do i need to reconfigure this to use a logstash forwarder 
config on the mail server itself or am i doing something else really stupid

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c057d779-8042-4557-9d1e-05b9f42616d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: I'm getting exceptions while searching using cirrussearch from Mediawiki

2014-10-28 Thread Vijay K
I'm already on the irc at http://webchat.freenode.net/ in the channel 
#elasticsearh

On Tuesday, October 28, 2014 5:48:44 PM UTC+5:30, Nikolas Everett wrote:
>
> I'll hop on irc and help from there. Depending on the version of cirrus 
> you use it requires groovy or MVEL support. 
> On Oct 28, 2014 4:12 AM, "Isabel Drost-Fromm" <
> isabel.d...@elasticsearch.com > wrote:
>
>> This looks like a configuration issue. The clue is in the following line:
>>
>> "QueryParsingException[[wikidb_content_first] script_score the script 
>> could not be loaded]; nested: ScriptException[dynamic scripting for[mvel]
>>  disabled];"
>>
>> According to 
>>
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html
>>
>> scripting was disabled by default starting version 1.2.0.
>> Isabel
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAFSgB-BSB4HJSwgku_6Hco%3DdpzVxgpz77SVWerrEitSQdvY6SQ%40mail.gmail.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a3fc6248-87cc-4b84-b28f-edcb813a674a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: I'm getting exceptions while searching using cirrussearch from Mediawiki

2014-10-28 Thread Nikolas Everett
I'll hop on irc and help from there. Depending on the version of cirrus you
use it requires groovy or MVEL support.
On Oct 28, 2014 4:12 AM, "Isabel Drost-Fromm" <
isabel.drostfr...@elasticsearch.com> wrote:

> This looks like a configuration issue. The clue is in the following line:
>
> "QueryParsingException[[wikidb_content_first] script_score the script
> could not be loaded]; nested: ScriptException[dynamic scripting for[mvel]
>  disabled];"
>
> According to
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html
>
> scripting was disabled by default starting version 1.2.0.
> Isabel
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAFSgB-BSB4HJSwgku_6Hco%3DdpzVxgpz77SVWerrEitSQdvY6SQ%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2Y%2B78p6cMcyPGK8OTnJ_U4K7HKW4KmEd5-Hq1-_yi4JQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: I'm getting exceptions while searching using cirrussearch from Mediawiki

2014-10-28 Thread Vijay K
As the dynamic scripting is disabled, Even I tried to create a script under 
/config/scripts and still I'm getting the same exception. I restarted the 
elasticsearch as well,
As mentioned in, 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_automatic_script_reloading/
 
the scripts should be automatically loaded. But nothing seems to be working.

any ideas why things are not working?


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f8cc00f-7f14-4ce3-a9f1-ff5463e2dcec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Data loss after network failure

2014-10-28 Thread Frank Evers
   

Hi all,

We noticed unwanted synchronisation behaviour using two nodes with 
elasticsearch-1.3.4. The nodes got disconnected from each other after a 
network failure and both started running their own cluster. One node got 
new data pushed to it, the other didn't. After restarting the node that got 
the most recent data, it joined up with the other one. This resulted in the 
loss of all the new data from the master node since the network failure.

Is this expected behaviour? If so, how can we prevent this scenario from 
occurring again?

Thanks,

Frank
  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3bd69bfd-001c-4c62-bfde-616c1f1695c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES 1.3.4: sometimes plugins return empty page

2014-10-28 Thread John Cooper
I've just upgraded to 1.3.4 and no problems with plugins. I upgraded 
bigdesk to 2.5.0 and cloud-aws to 2.3.0 as needed for ES 1.3.4. Have you 
checked you are running latest version of head?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/47595127-623e-43be-8f1e-b1e4b0bb42bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Huge response time for simple queries in an uber environment

2014-10-28 Thread Cosmin-Radu Vasii
Marvel tells me that the JVM heap usage is around 75% for all the data 
nodes(like 12 Gb) and the CPU is constantly between 90-95%. You are saying 
that my queries are pretty different and I am not really using the cache 
and ES will try to cache documents, but it will get rid of those really 
quickly? Something like the documents are being stored in the cache, but 
for nothing?

marți, 28 octombrie 2014, 12:04:59 UTC+2, Jörg Prante a scris:
>
> Did you monitor the memory usage?
>
> If memory is going low, try to avoid caching (unless you always query for 
> "416" and "29") and check if you can avoid "query_string"  in preference to 
> "term", and boolean clauses with single term, since they can be simplified.
>
> Also note you only need one "constant_score" per query because there is no 
> such thing like multiple scores in a single query.
>
> A simplification would look like
>
> GET index/_search
> {
>   "query": {
> "constant_score": {
>   "filter": {
>  "bool": {
>   "must": [
> { "term": { "objfield4" : 416 } },
> { "term": { "objfield5" : 29 } }
> ]
> }
>   }
> }
>   }
> }
>
> Jörg
>
> On Tue, Oct 28, 2014 at 9:44 AM, Cosmin-Radu Vasii  > wrote:
>
>> Hi,
>>
>> I have the following environment:
>> 10 ES data nodes, each with 8 cores, 30 Gb of RAM and really good 
>> hardrive, -Xms18000m -Xmx18000m, default thread pools(in this case 24 
>> threads for search operations)
>> 2 ES dedicated master nodes: 8 cores, 30 Gb of RAM and really good 
>> hardrive(hardrive not relevant for this nodes though),  -Xms18000m 
>> -Xmx18000m, default thread pools(in this case 24 threads for search 
>> operations)
>> 4 Tomcat 7 instances, with a webapp which has a node client which 
>> connects to the ES cluster for sending queries: 14 Gb of RAM, 4 cores, 250 
>> threads for Tomcat, -Xms7000m -Xmx7000m
>> 1 Haproxy which acts as a balancer in front of the 4 Tomcat instances.
>>
>> There have indexed* ~1 billion documents*, distributed in 10 shards and 
>> 0 replicas at an insane rate, from* 7 to 10 docs/s*. I increased 
>> the *replicas to 2* afterwards(in 1h I had 1 replica added). The 
>> documents are quite small:
>>
>> {
>>"field1": "13446", //5 digits
>>"date1": "24/10/2013 03:22 AM", //date
>>"field2": "3502", //4 digits
>>"field3": "5310", //4 digits
>>"date2": "02/04/2012 01:21 AM", //date
>>"field4": "4f3dce61-1d6c-418f-877b-5419a043bd42", //UUID
>>"field5": "2890",//4 digits
>>"obj": {
>>   "objfield1": "761532940881576", //15 digits
>>   "objfield2": "231806579463504",//15 digits
>>   "objfield3": "879",//3 digits
>>   "objfield4": "416",//3 digits
>>   "objfield5": "14"//2 digits
>>}
>> }
>>
>> All the fields are dates(2 of them) and string in the mapping, even 
>> though they are numbers in real life.
>>
>> I ran queries using 800 different threads from 4 different jmeter 
>> machines, each machine with 200 threads(this machines are also really 
>> powerful).
>>
>> The queries built by the webapps using the JAVA API look like this(I use 
>> filters and try to take advantage of the cache). The queries are different 
>> combinations between maximum 3 of the fields and range for the 2 dates.
>>
>> GET index/_search
>> {
>>   "query": {
>> "constant_score": {
>>   "filter": {
>> "fquery": {
>>   "query": {
>> "bool": {
>>   "must": [
>> {
>>   "constant_score": {
>> "filter": {
>>   "fquery": {
>> "query": {
>>   "query_string": {
>> "query": "obj.objfield4:416"
>>   }
>> },
>> "_cache": true
>>   }
>> }
>>   }
>> },
>> 
>> {
>>   "constant_score": {
>> "filter": {
>>   "fquery": {
>> "query": {
>>   "query_string": {
>> "query": "obj.objfield5:29"
>>   }
>> },
>> "_cache": true
>>   }
>> }
>>   }
>> }
>>   ]
>> }
>>   },
>>   "_cache": true
>> }
>>   }
>> }
>>   }
>> }
>>
>> The results are outrageous, between *20 seconds and even 100 seconds*, 
>> and I have 30 shards even distributed between the nodes.
>>
>> What am I doing wrong here, because I would expect results below 3 
>> seconds.
>>
>> Should I have

Re: Huge response time for simple queries in an uber environment

2014-10-28 Thread joergpra...@gmail.com
Did you monitor the memory usage?

If memory is going low, try to avoid caching (unless you always query for
"416" and "29") and check if you can avoid "query_string"  in preference to
"term", and boolean clauses with single term, since they can be simplified.

Also note you only need one "constant_score" per query because there is no
such thing like multiple scores in a single query.

A simplification would look like

GET index/_search
{
  "query": {
"constant_score": {
  "filter": {
 "bool": {
  "must": [
{ "term": { "objfield4" : 416 } },
{ "term": { "objfield5" : 29 } }
]
}
  }
}
  }
}

Jörg

On Tue, Oct 28, 2014 at 9:44 AM, Cosmin-Radu Vasii <
cosminradu.va...@gmail.com> wrote:

> Hi,
>
> I have the following environment:
> 10 ES data nodes, each with 8 cores, 30 Gb of RAM and really good
> hardrive, -Xms18000m -Xmx18000m, default thread pools(in this case 24
> threads for search operations)
> 2 ES dedicated master nodes: 8 cores, 30 Gb of RAM and really good
> hardrive(hardrive not relevant for this nodes though),  -Xms18000m
> -Xmx18000m, default thread pools(in this case 24 threads for search
> operations)
> 4 Tomcat 7 instances, with a webapp which has a node client which connects
> to the ES cluster for sending queries: 14 Gb of RAM, 4 cores, 250 threads
> for Tomcat, -Xms7000m -Xmx7000m
> 1 Haproxy which acts as a balancer in front of the 4 Tomcat instances.
>
> There have indexed* ~1 billion documents*, distributed in 10 shards and 0
> replicas at an insane rate, from* 7 to 10 docs/s*. I increased
> the *replicas to 2* afterwards(in 1h I had 1 replica added). The
> documents are quite small:
>
> {
>"field1": "13446", //5 digits
>"date1": "24/10/2013 03:22 AM", //date
>"field2": "3502", //4 digits
>"field3": "5310", //4 digits
>"date2": "02/04/2012 01:21 AM", //date
>"field4": "4f3dce61-1d6c-418f-877b-5419a043bd42", //UUID
>"field5": "2890",//4 digits
>"obj": {
>   "objfield1": "761532940881576", //15 digits
>   "objfield2": "231806579463504",//15 digits
>   "objfield3": "879",//3 digits
>   "objfield4": "416",//3 digits
>   "objfield5": "14"//2 digits
>}
> }
>
> All the fields are dates(2 of them) and string in the mapping, even though
> they are numbers in real life.
>
> I ran queries using 800 different threads from 4 different jmeter
> machines, each machine with 200 threads(this machines are also really
> powerful).
>
> The queries built by the webapps using the JAVA API look like this(I use
> filters and try to take advantage of the cache). The queries are different
> combinations between maximum 3 of the fields and range for the 2 dates.
>
> GET index/_search
> {
>   "query": {
> "constant_score": {
>   "filter": {
> "fquery": {
>   "query": {
> "bool": {
>   "must": [
> {
>   "constant_score": {
> "filter": {
>   "fquery": {
> "query": {
>   "query_string": {
> "query": "obj.objfield4:416"
>   }
> },
> "_cache": true
>   }
> }
>   }
> },
>
> {
>   "constant_score": {
> "filter": {
>   "fquery": {
> "query": {
>   "query_string": {
> "query": "obj.objfield5:29"
>   }
> },
> "_cache": true
>   }
> }
>   }
> }
>   ]
> }
>   },
>   "_cache": true
> }
>   }
> }
>   }
> }
>
> The results are outrageous, between *20 seconds and even 100 seconds*,
> and I have 30 shards even distributed between the nodes.
>
> What am I doing wrong here, because I would expect results below 3 seconds.
>
> Should I have the fields as numbers and not as strings? Should I remove
> the query string and use a term there?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com
> 

Re: High cpu load but low memory usage.

2014-10-28 Thread Anh Huy Do
Hi Jorg,

This is hot thread info when CPU get high :

::: [Search-195][W2LL0dnBSGu_5k7fAHt0uA][inet[/195:9300]]{master=true}

  

  28.1% (140.4ms out of 500ms) cpu usage by thread 
'elasticsearch[Search-195][search][T#22]'

10/10 snapshots sharing following 10 elements

  sun.misc.Unsafe.park(Native Method)

  java.util.concurrent.locks.LockSupport.park(Unknown Source)

  
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

  
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

  
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

  
org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

  java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

  java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

  java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

  java.lang.Thread.run(Unknown Source)

  

  24.7% (123.4ms out of 500ms) cpu usage by thread 
'elasticsearch[Search-195][search][T#9]'

10/10 snapshots sharing following 10 elements

  sun.misc.Unsafe.park(Native Method)

  java.util.concurrent.locks.LockSupport.park(Unknown Source)

  
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

  
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

  
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

  
org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

  java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

  java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

  java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

  java.lang.Thread.run(Unknown Source)

  

  24.0% (119.8ms out of 500ms) cpu usage by thread 
'elasticsearch[Search-195][search][T#19]'

7/10 snapshots sharing following 10 elements

  sun.misc.Unsafe.park(Native Method)

  java.util.concurrent.locks.LockSupport.park(Unknown Source)

  
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

  
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

  
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

  
org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

  java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

  java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

  java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

  java.lang.Thread.run(Unknown Source)

3/10 snapshots sharing following 23 elements

  
org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)

  
org.elasticsearch.index.search.child.ConstantScorer.nextDoc(ConstantScorer.java:48)

  
org.elasticsearch.common.lucene.docset.DocIdSets.toCacheable(DocIdSets.java:94)

  
org.elasticsearch.index.search.child.CustomQueryWrappingFilter.getDocIdSet(CustomQueryWrappingFilter.java:73)

  
org.elasticsearch.common.lucene.search.AndFilter.getDocIdSet(AndFilter.java:54)

  
org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(ApplyAcceptedDocsFilter.java:45)

  org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:128)

  
org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:533)

  org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:133)

  org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)

  
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

  org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:581)

  org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:533)

  org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:510)

  org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:345)

  org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:115)

  
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

  
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:623)

  
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:612)

  
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHan

Re: elasticsearch phraze term frequency .tf() containing multiple words

2014-10-28 Thread vineeth mohan
Hello Valergi ,

This wont work , normally becuase the string would be tokenized into green
and energy.
If you use shingle token filter and set it as 2   , it might work.
Or in this case , you can see the position value of both the token using
the script and if its next to each other , you can take it as an
occurrence.

Thanks
  Vineeth

On Tue, Oct 28, 2014 at 3:06 PM,  wrote:

> I want to access frequency of a phraze combined from multiple words e.g.
> "green energy"
>
> I can access tf of "green" and "energy", example:
>
> "function_score":
> {
> "filter" : {
> "terms" : { "content" : ["energy","green"]}
> },
> "script_score": {
> "script": "_index['content']['energy'].tf() +
> _index['content']['green'].tf()",
> "lang":"groovy"
> }
> }
>
> This works fine. However, how can I find the frequency of a term "green
> energy" as
>
> _index['content']['green energy'].tf() does not work
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mjK%3DbgdSEZvrsfz5d_HnN8BTrJ5d9O4yAHQuOODE4YWQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch phraze term frequency .tf() containing multiple words

2014-10-28 Thread valerij . vasilcenko
I want to access frequency of a phraze combined from multiple words e.g. 
"green energy"

I can access tf of "green" and "energy", example:

"function_score":
{
"filter" : {
"terms" : { "content" : ["energy","green"]}
},
"script_score": {
"script": "_index['content']['energy'].tf() + 
_index['content']['green'].tf()",
"lang":"groovy"
}
}

This works fine. However, how can I find the frequency of a term "green 
energy" as

_index['content']['green energy'].tf() does not work

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Connecting to ES via a http proxy in perl client

2014-10-28 Thread Clinton Gormley
Hi Kevin

On Friday, 24 October 2014 18:24:00 UTC+2, Kevin Van Workum wrote:
>
> I'm trying to connect to my ES via a proxy using a client written in perl. 
> What's the best way to do this?
>
> Here's what I have, and it works, but I suspect there's a more straight 
> forward approach:
>
> $e = Search::Elasticsearch->new(
>cxn => 'LWP',
>nodes => 'node1:9200' );
>
> $ENV{HTTP_proxy} = "http://proxy:3128";;
> $e->transport->cxn_pool->next_cxn->handle->env_proxy;
>
>
You should be able to do this using the default Cxn backend (HTTP::Tiny). 
 I haven't tried proxies but, according to the HTTP::Tiny docs, proxies are 
supported: https://metacpan.org/pod/HTTP::Tiny#PROXY-SUPPORT

This should work:

 

> $ENV{http_proxy} = "http://proxy:3128";;
> $e = Search::Elasticsearch->new( nodes => 'node1:9200' );
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/453ec592-e3c9-4ff4-8f5a-8e71bab901d5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Nested Documents - how to search for nested elements, not whole documents

2014-10-28 Thread henrik . behrens
Hi,

I have hierarchical data: orders consist of several order items.

How can I search for a certain orderItem without getting returned the other 
items of the same order?
I tried to use nested documents, but I always receive the whole order and 
not the requested order item.

Example:

POST /salesorder7
{
"mappings": {
"complete": {
"properties": {
"orderDetails": {
"type": "nested",
"properties": {
"lineItemNumber": {
"type": "string"
},
"productInformation": {
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
}
}
},
"totalLineItemPrice": {
"properties": {
"amount": {
"type": "float"
},
"currency": {
"type": "string"
}
}
},
"orderHeader": {
"properties": {
"orderNumber": {
"type": "string"
}
}
}
}
}
}
}
}
}

PUT /salesorder7/complete/1
{
"orderHeader": {
"orderNumber": "1"
},
"orderDetails": [
{
"lineItemNumber": "11",
"productInformation": {
"name": "product1",
"id": "p1"
},
"totalLineItemPrice": {
"amount": "105.04",
"currency": "EUR"
}
},
{
"totalLineItemPrice": {
"amount": "9.99",
"currency": "EUR"
},
"lineItemNumber": "12",
"productInformation": {
"name": "product2",
"id": "p2"
}
}
]
}

POST /salesorder7/complete/_search
{
"fields": [
"orderHeader.orderNumber",
"orderDetails.unitPrice.amount",
"orderDetails.productInformation.name"
],
"query": {
"nested": {
"path": "orderDetails",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"match": {
"orderDetails.productInformation.name": 
"product1"
}
}
]
}
}
}
}
}

This returns 

{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4054651,
"hits": [
{
"_index": "salesorder7",
"_type": "complete",
"_id": "1",
"_score": 1.4054651,
"fields": {
"orderDetails.productInformation.name": [
"product1",
"product2"
],
"orderHeader.orderNumber": [
"1"
]
}
}
]
}
}

I want it to return product1 only, and not product1 and product2.

How can I achive this?

Best regards
Henrik

P.S. Sorry for not using CURL, under Windows it seems to only accept single 
line nonspaced documents or documents in files.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4eaf18d4-2db3-4264-8dfc-767ca34d5f28%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Find the 100 closest neighbors to a point (lng, lat)

2014-10-28 Thread Michael Lumbroso
Hi Adrien,

thanks for your answer, but actually, I need something really optimized, so
I guess ES is not the way to go.

Can you think of better ways to actually do that?

Thanks

2014-10-27 18:27 GMT+01:00 Adrien Grand :

> Hi Michael,
>
> You can do that using geo-distance sorting:
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html#_geo_distance_sorting
>
> It would probably not be optimal in the sense that elasticsearch will need
> to compute the distance for every matching document, but maybe it would
> still be fast enough?
>
>
> On Fri, Oct 24, 2014 at 11:11 AM, Michael Lumbroso <
> mich...@sportintown.com> wrote:
>
>> Hello,
>>
>> sorry if this question has already been asked, but I didn't find much
>> material during my search.
>> Basically, what I need to do, is find the exact 100 closest objects
>> around a spatial point (longitude, latitude), among a 1 million
>> geolocalized object all around the world.
>>
>> Is there an efficient way to do that? (performance is the most important
>> parameter here)
>> Are there plugins/libraries to help me do so?
>> Are there better options than Elasticsearch for this very problem?
>>
>> Thanks for your help, and keep up the good work on this wonderful tool
>>
>> Michael
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/04ce1fbd-c88f-4517-9d56-044bb235c29c%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Adrien Grand
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/aWZBVkZiSY4/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j45xYH%2BGmF%2BQ3t5m1OYLKZD7Vp6p0HxpmkD7-Q%2B7Zu1hQ%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH-zEnNoh%2BTysKSiX7y67M6HyFE60cEnYhf6DWSQoCu_5jtRdQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Huge response time for simple queries in an uber environment

2014-10-28 Thread Cosmin-Radu Vasii
Hi,

I have the following environment:
10 ES data nodes, each with 8 cores, 30 Gb of RAM and really good hardrive, 
-Xms18000m -Xmx18000m, default thread pools(in this case 24 threads for 
search operations)
2 ES dedicated master nodes: 8 cores, 30 Gb of RAM and really good 
hardrive(hardrive not relevant for this nodes though),  -Xms18000m 
-Xmx18000m, default thread pools(in this case 24 threads for search 
operations)
4 Tomcat 7 instances, with a webapp which has a node client which connects 
to the ES cluster for sending queries: 14 Gb of RAM, 4 cores, 250 threads 
for Tomcat, -Xms7000m -Xmx7000m
1 Haproxy which acts as a balancer in front of the 4 Tomcat instances.

There have indexed* ~1 billion documents*, distributed in 10 shards and 0 
replicas at an insane rate, from* 7 to 10 docs/s*. I increased the 
*replicas 
to 2* afterwards(in 1h I had 1 replica added). The documents are quite 
small:

{
   "field1": "13446", //5 digits
   "date1": "24/10/2013 03:22 AM", //date
   "field2": "3502", //4 digits
   "field3": "5310", //4 digits
   "date2": "02/04/2012 01:21 AM", //date
   "field4": "4f3dce61-1d6c-418f-877b-5419a043bd42", //UUID
   "field5": "2890",//4 digits
   "obj": {
  "objfield1": "761532940881576", //15 digits
  "objfield2": "231806579463504",//15 digits
  "objfield3": "879",//3 digits
  "objfield4": "416",//3 digits
  "objfield5": "14"//2 digits
   }
}

All the fields are dates(2 of them) and string in the mapping, even though 
they are numbers in real life.

I ran queries using 800 different threads from 4 different jmeter machines, 
each machine with 200 threads(this machines are also really powerful).

The queries built by the webapps using the JAVA API look like this(I use 
filters and try to take advantage of the cache). The queries are different 
combinations between maximum 3 of the fields and range for the 2 dates.

GET index/_search
{
  "query": {
"constant_score": {
  "filter": {
"fquery": {
  "query": {
"bool": {
  "must": [
{
  "constant_score": {
"filter": {
  "fquery": {
"query": {
  "query_string": {
"query": "obj.objfield4:416"
  }
},
"_cache": true
  }
}
  }
},

{
  "constant_score": {
"filter": {
  "fquery": {
"query": {
  "query_string": {
"query": "obj.objfield5:29"
  }
},
"_cache": true
  }
}
  }
}
  ]
}
  },
  "_cache": true
}
  }
}
  }
}

The results are outrageous, between *20 seconds and even 100 seconds*, and 
I have 30 shards even distributed between the nodes.

What am I doing wrong here, because I would expect results below 3 seconds.

Should I have the fields as numbers and not as strings? Should I remove the 
query string and use a term there?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: I'm getting exceptions while searching using cirrussearch from Mediawiki

2014-10-28 Thread Vijay K
Hi,

 I even tried to add script.disable_dynamic: false in the 
config/elasticsearch.yml and also tried to remove that line.

There is no change in the exception on both cases. FYI, I restarted the 
elasticsearch after the every change.

How can I confirm that the elasticsearch is running considering the 
elasticsearch.yml ?? or will there be any other issues? 

Thanks


On Tuesday, October 28, 2014 1:42:15 PM UTC+5:30, Isabel Drost-Fromm wrote:
>
> This looks like a configuration issue. The clue is in the following line:
>
> "QueryParsingException[[wikidb_content_first] script_score the script 
> could not be loaded]; nested: ScriptException[dynamic scripting for[mvel]
>  disabled];"
>
> According to 
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html
>
> scripting was disabled by default starting version 1.2.0.
> Isabel
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/215a857d-9b75-4c50-b999-2649cff1c5cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Customizing Directory and IndexWriter behavior via custom ES plug-in

2014-10-28 Thread Ákos Kitta
Awesome. Thanks a lot for the help. I'll give a try.

On Monday, October 27, 2014 2:30:51 PM UTC+1, Jörg Prante wrote:
>
> I stand corrected - there is also the possibility of the setting 
> "index.store.type", by setting this to a Java class name, you can use this 
> as the current index store implementation from a plugin.
>
> So, no patching/forking required.
>
> Jörg
>
> On Mon, Oct 27, 2014 at 2:26 PM, joerg...@gmail.com  <
> joerg...@gmail.com > wrote:
>
>> Regarding the deletion policy, you can set the class name of your 
>> deletion policy implementation in the setting "index.deletionpolicy.type"
>>
>> For custom Directory, you have to 
>> patch org.elasticsearch.index.store.IndexStoreModule with your custom index 
>> store. The index store is something like an IndexWriter / Lucene Directory 
>> on steroids. At the moment, it is not possible to add custom index stores 
>> from a plugin (see the fixed enumeration of implementations 
>> in IndexStoreModule)
>>
>> Jörg
>>
>> On Mon, Oct 27, 2014 at 1:22 PM, Ákos Kitta > > wrote:
>>
>>> Hi there,
>>>
>>> in the last couple of years we managed to customize Apache Lucene 
>>> (through its public API) to support branching, tagging and compare in a 
>>> concurrent fashion for our server application. We managed to achieve this 
>>> by using a couple of custom Directory, exactly one IndexDeletionPolicy and 
>>> one MergePolicy implementations. Currently we are considering to 
>>> replace Lucene with Elasticsearch on the server-side. Before we jumped into 
>>> the details of collecting the differences between the two technologies in 
>>> respect of the search and indexing functionality and for instance how to 
>>> port our custom collectors and how to replace NDVs, we would like to make 
>>> sure if it is possible at all.
>>>
>>> I've just checked out the source and realized that the registration of 
>>> the services  are done via various module implementations and the actual 
>>> configured service implementations are injected into the constructors. For 
>>> the sake of simplicity is there a way for example to create an 
>>> Elasticsearch module which forces the underlying IndexWriter to use the 
>>> FooCustomDeletionPolicy 
>>> instead of the default KeepOnlyLastDeletionPolicy? I assume if this is 
>>> straightforward we could use or custom implementations for the directory 
>>> and the IndexWriter what we are currently using with pure Lucene. After 
>>> doing some research I found this [1] thread. Am I close to the 
>>> answer/solution?
>>>
>>> I have to notice we would like to achieve this without forking the 
>>> public repository.
>>>
>>> Thanks in advance for the feedbacks.
>>>
>>> Cheers,
>>> Akos
>>>
>>> [1]: https://groups.google.com/forum/#!topic/elasticsearch/rFaLnI5FRf4
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com .
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/0345efea-3134-488d-b13d-199a24642422%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5f6310e5-dc98-4a54-9c2c-ce39cecab2b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: I'm getting exceptions while searching using cirrussearch from Mediawiki

2014-10-28 Thread Isabel Drost-Fromm
This looks like a configuration issue. The clue is in the following line:

"QueryParsingException[[wikidb_content_first] script_score the script could
not be loaded]; nested: ScriptException[dynamic scripting for[mvel] disabled
];"

According to

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html

scripting was disabled by default starting version 1.2.0.
Isabel

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFSgB-BSB4HJSwgku_6Hco%3DdpzVxgpz77SVWerrEitSQdvY6SQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: FacetPhaseExecutionException with new Marvel installation

2014-10-28 Thread Boaz Leskes
Good it works now.

> it seems that some state regarding Marvel is kept in the production
cluster, and whatever it was got cleared when I reinstalled the plugin
there.

There is no state in production cluster, just an in memory boolean of
wether the local agent checked for the template or not. Every time the
agent wakes up it checks.

Is anything interesting the logs perhaps?

On Tue, Oct 28, 2014 at 12:45 AM, Ross Simpson  wrote:

> Hi again,
>
> Yep, I had added the required settings to the yaml files first.
>
> I tried the steps you described above, and it did not have any effect --
> still no template present, and still getting the error.  Since it wasn't
> too much trouble, I started over from scratch -- rebuilt the monitoring
> cluster, but also uninstalled then reinstalled the plugin in the production
> cluster, and restarted.  After this, I saw a bunch of update_mapping
> calls.  The template was present, and the errors went away.  It seems that
> some state regarding Marvel is kept in the production cluster, and whatever
> it was got cleared when I reinstalled the plugin there.  That may be
> something worth mentioning in the installation docs.
>
> In any case, thanks for your help -- it's all working now!
>
> Cheers,
> Ross
>
>
> On Tuesday, 28 October 2014 09:48:12 UTC+11, Boaz Leskes wrote:
>>
>> Hey,
>>
>> You probably did but just double checking- did you change the settings in
>> the yaml files before restarting the nodes?
>>
>> There is an easier way to fix this than a full restart: first restart a
>> single node on production. That will cause the agent to check again for the
>> template. Verify that the template was added. The delete all .marvel-2014*
>> indices on the monitoring cluster and let them be recreated base on the
>> template.
>>
>> Boaz
>>
>>
>> —
>> Sent from Mailbox 
>>
>>
>> On Mon, Oct 27, 2014 at 11:25 PM, Ross Simpson  wrote:
>>
>>> Hi Boaz,
>>>
>>> To install, I ran
>>>
>>>  bin/plugin --install elasticsearch/marvel/latest
>>>
>>>
>>> on each node in both clusters, then restarted both clusters.
>>>
>>> Since then, I have tried several things, including deleting the indexes
>>> from the monitoring cluster and reinstalling the plugin on the monitoring
>>> cluster.  I'll try now to delete all the marvel indexes, uninstall, then
>>> reinstall marvel into both clusters.
>>>
>>> I'm a bit stumped otherwise, so I'm all ears for any other suggestions.
>>>
>>> Cheers,
>>> Ross
>>>
>>>
>>>
>>> On Tuesday, 28 October 2014 08:30:54 UTC+11, Boaz Leskes wrote:

 It looks like something is wrong is indeed wrong with your marvel index
 template which should be there before data is indexed. How did you install
 marvel? Did you perhaps delete the data folder of the monitoring cluster
 after production was already shipping data?

 Cheers,
 Boaz

 On Monday, October 27, 2014 7:45:34 AM UTC+1, Ross Simpson wrote:
>
> To troubleshoot a little more, I rebuilt the monitoring cluster to use
> ElasticSearch 1.1.1, which matches the ES version used in the production
> cluster.  No luck.
>
> On the Overview dashboard, I can see some data (summary, doc count,
> search and indexing rates are all populated [screenshot attached]), but 
> but
> both the nodes and indices sections are empty other than the errors
> mentioned in the previous post.  Cluster pulse doesn't show any events at
> all; node stats and index stats do both show data.
>
> Any further suggestions would be greatly appreciated :)
>
> Cheers,
> Ross
>
>
>
> On Monday, 27 October 2014 11:15:42 UTC+11, Ross Simpson wrote:
>>
>> I've got a brand-new Marvel installation, and am having some
>> frustrating issues with it: on the overview screen, I am constantly 
>> getting
>> errors like:
>> *Oops!* FacetPhaseExecutionException[Facet [timestamp]: failed to
>> find mapping for node.ip_port.raw]
>>
>> *Production cluster:*
>> * ElasticSearch 1.1.1
>> * Marvel 1.2.1
>> * Running in vSphere
>>
>> *Monitoring cluster:*
>> * ElasticSearch 1.3.4
>> * Marvel 1.2.1
>> * Running in AWS
>>
>> After installing the plugin and bouncing all nodes in both clusters,
>> Marvel seems to be working -- an index has been created in the monitoring
>> cluster (.marvel-2014.10.26), and I see thousands of documents in
>> there.  There are documents with the following types: cluster_state,
>> cluster_stats, index_stats, indices_stats, node_stats.  So, it does
>> seem that data is being shipped from the prod cluster to the monitoring
>> cluster.
>>
>> I've seen in the user group that other people have had similar
>> issues.  Some of those mention problems with the marvel index template.  
>> I
>> don't seem to have any at all templates in my monitoring cluster:
>>
>>  $ curl -XGET localhost:9200

Re: What happens to data in an existing type if we update the mapping to specify 'path's for _id and _routing

2014-10-28 Thread David Pilato
You can update a mapping (adding new fields or new sub fields)
Old documents won't be updated.

A common best practice is to reindex your data.

About specific fields like _routing and _id, I think that updating the mapping 
for them won't work so it will require to reindex.
Never tested it though.

-- 
David Pilato | Technical Advocate | elasticsearch.com
david.pil...@elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs



Le 28 octobre 2014 à 08:13:04, Preeti Raj - Buchhada (pbuchh...@gmail.com) a 
écrit:

Any ideas on this?


On Monday, October 27, 2014 3:03:16 PM UTC+5:30, Preeti Raj - Buchhada wrote:
We are using ES 1.3.2.
We have a need to specify custom id and routing values when indexing.
We've been doing this using Java APIs, however we would now like to update the 
mapping to specify 'path's for _id and _routing.

The question we have is:
1) Since this type already has a huge number of documents, can we change the 
mapping? When we tried it, we got a '"acknowledged": true' response, but it 
doesn't seem to be working when we tried indexing.
2) In case there is a way to achieve this, will it affect only the new 
documents being indexed?

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1b912d68-900f-4f6f-be5f-cbae83776e1b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.544f43de.643c9869.1405c%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: What happens to data in an existing type if we update the mapping to specify 'path's for _id and _routing

2014-10-28 Thread Preeti Raj - Buchhada
Any ideas on this?


On Monday, October 27, 2014 3:03:16 PM UTC+5:30, Preeti Raj - Buchhada 
wrote:
>
> We are using ES 1.3.2.
> We have a need to specify custom id and routing values when indexing.
> We've been doing this using Java APIs, however we would now like to update 
> the mapping to specify 'path's for _id and _routing.
>
> The question we have is:
> 1) Since this type already has a huge number of documents, can we change 
> the mapping? When we tried it, we got a '"acknowledged": true' response, 
> but it doesn't seem to be working when we tried indexing.
> 2) In case there is a way to achieve this, will it affect only the new 
> documents being indexed?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1b912d68-900f-4f6f-be5f-cbae83776e1b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.