Re: Massive perf difference with filter versus filtered query

2015-01-27 Thread David Pilato
Because the first one is a post_filter (BTW we renamed it). So it is applied 
after the search on the resultset.
The second is applied first and then the query is run.

I guess this is the difference here.

I would use the second one everytime unless you need to compute aggregations on 
the full dataset instead of on the filtered resultset.

My 2 cents

David

> Le 28 janv. 2015 à 05:44, Michael Giagnocavo  a écrit :
> 
> I'm seeing some major performance difference depending on if I wrap my filter 
> in a query. I don't understand, because the docs say to use filters for exact 
> matching.
> 
> This query takes about 800ms, even after repeated executions (so caches are 
> hot):
> {  "filter": {  "term": {  "ProjectId": 4191152 }  },
>  "from": 0,  "size": 50,
>  "sort": [],  "facets": {}
> }
> 
> But slapping query filtered around it makes it take 5ms on repeated 
> executions:
> 
> {  "query": { "filtered": {
>  "filter": { "term": { "ProjectId": 4191152  } } }  },
>  "from": 0,  "size": 50,
>  "sort": [],  "facets": {}
> }
> 
> What am I misunderstanding? I've got 80M documents, 30 of which match this 
> query, so the only thing I can guess is that somehow when I don't use a 
> "query" element at the root, Elasticsearch retrieves every document and 
> applies my filter, versus using some indexed approach when using query.
> 
> -Michael
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/BLUPR07MB674B6F4B405F739E034FB1AD4330%40BLUPR07MB674.namprd07.prod.outlook.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2D815FC1-2E69-4156-B3DE-1D2C15F2DB2C%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: How to perform search on the qualified docs resulted from a query

2015-01-27 Thread David Pilato
In 1.5, a new inner hits feature will come. 
https://github.com/elasticsearch/elasticsearch/pull/8153


David

> Le 28 janv. 2015 à 04:29, bvnrwork  a écrit :
> 
> Okay Thank you ,does nested objects help here .
> 
> Is it possible to get only inner objects (from nested objects ) ?
> 
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/nested-objects.html
> 
> 
> 
> 
>> On Tuesday, 27 January 2015 21:07:35 UTC-5, David Pilato wrote:
>> You need to run 2 queries in that case IMHO.
>> 
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>> 
>>> Le 28 janv. 2015 à 00:54, buddarapu nagaraju  a écrit :
>>> 
>>> Any answers for me :)?
>>> 
>>> Regards
>>> Nagaraju
>>> 908 517 6981
>>> 
 On Sun, Jan 25, 2015 at 2:14 PM, buddarapu nagaraju  
 wrote:
 I dont get it exactly so explaining doc structure and example docs .I 
 understand that HasChild will get you the parent documents and HasParent 
 will get the only parents.Please help me in understanding 
 
 have two document types :one is FakeDocument which is the fake document 
 holding the group id for all docs in a group 
 and other is Document which is the actual document
 
 Example Docs are:
 
 
 FakeDocument{
 Id:"G1"
 }
 FakeDocument{
 Id:"G2"
 }
 
 Indexed two documents under group "G1"
 
 Document1{
 Name:One
 
 }
 
 Document2{
 Name:Two
 
 }
 
 
 Indexed two documents under group "G2"
 
 Document3{
 Name:Three
 
 }
 
 Document4{
 Name:Four
 
 }
 
 
 Now my scenario is 
 
 
 querying for "Name:One" should result me document with name :"One" and 
 also all other documents that has same _parentId in one request 
 
 
 
 
 
 
 Regards
 Nagaraju
 908 517 6981
 
> On Sun, Jan 25, 2015 at 12:14 AM, David Pilato  wrote:
> If you are using Parent / Child feature, you should look at has_parent, 
> has_child filters.
> 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html#query-dsl-has-child-filter
> 
> In that case, you don't need to get back parent id yourself.
> 
> If you are not using Parent / child, I'm afraid you need to run 2 queries.
> 
> My 2 cents 
> 
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
> 
>> Le 25 janv. 2015 à 05:44, bvnrwork  a écrit :
>> 
>> Hi,
>> 
>> can some one help me on this.
>> 
>> have scenario where have Query1 which qualifies some documents and now I 
>> want to take _parent id of qualified documents and search on _parentid 
>> field to get the qualified documents and all others documents with same 
>> parent id 
>> 
>> .These two searches I want to do it in one single request , is it 
>> possible?
>> 
>> Regards,
>> Nagaraju
>> -- 
>> You received this message because you are subscribed to the Google 
>> Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send 
>> an email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a1946c82-7239-4e56-a082-ab11e66b4041%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to a topic in the 
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/Ye7ICf_ZUkg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> elasticsearc...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/A4959B00-FB5F-4EAA-9304-5B2BCC490601%40pilato.fr.
> 
> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/CAFtuXX%2B2wEtz7ff-xXa%2BztvC%2B%2Bg-hpKY4s5X0%2BRVAZ8U5d7Puw%40mail.gmail.com.
>>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/3146fbf6-63e9-45bf-8593-90dcacf16184%40googlegroups.com.
>

Re: Restoring indices from snapshot to a test server

2015-01-27 Thread David Pilato
Perfect. Thanks.

David

> Le 28 janv. 2015 à 03:55, Amos S  a écrit :
> 
> I opened an issue for AWS plugin project on github, I hope this is what you 
> were referring to. Here is the issue: 
> https://github.com/elasticsearch/elasticsearch-cloud-aws/issues/167
> About the "type missing" error - it turned out to be my mistake in trying to 
> copy the output of the "GET" verbatim to the input of the PUT, it turned out 
> that I had to peel off a couple of "{}"'s. Once I did that, the order of the 
> "type" attribute in relation to the rest didn't matter and the PUT succeeded. 
> See the update I gave in a previous message.
> I ended up making a new copy of the bucket. I'm now trying to restore it to 
> the test node.
> Thanks for your help.
> 
> --Amos
> 
>> On Wednesday, 28 January 2015 13:18:24 UTC+11, David Pilato wrote:
>> Could you open an issue in AWS plugin project (and may be in azure and gce) 
>> to support verify option as well?
>> 
>> BTW, I think we should try to support have type after settings or to clearly 
>> document it needs to be on the first line. Could you open an issue for this 
>> in elasticsearch?
>> 
>> Coming back to your issue, I'm afraid you need to wait for a next cloud 
>> plugin release or patch it yourself (and send a PR) or make your repo 
>> writable.
>> 
>> Best
>> 
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>> 
>>> Le 27 janv. 2015 à 23:30, Amos S  a écrit :
>>> 
>>> OK, following previous responses by you about the "type is missing" error, 
>>> I corrected the JSON payload I send to the PUT and got another error:
>>> 
>>> $ curl -XPUT 'http://localhost:9200/_snapshot/amos0' -d '{
>>>   "type":"s3",
>>>   "settings": {
>>> "region": "ap-southeast-1",
>>> "bucket": "prod-es-backup",
>>> "base_path": "elasticsearch/dev/snapshots0",
>>> "verify": "false"
>>>   }
>>> }'
>>> {"error":"RepositoryVerificationException[[amos0] path 
>>> [elasticsearch][dev][snapshots0] is not accessible on master node]; nested: 
>>> IOException[Unable to upload object 
>>> elasticsearch/dev/snapshots0/tests-ng9f5N6tTm6HZhrGF9s8aQ-master]; nested: 
>>> AmazonS3Exception[Access Denied (Service: Amazon S3; Status Code: 403; 
>>> Error Code: AccessDenied; Request ID: 7F3DE23BB617FFF4)]; ","status":500}
>>> 
>>> I think this confirms your suspicion that the repo creation process tries 
>>> to verify the repository by uploading a test object onto it and also that 
>>> it ignores the "verify: false" setting.
>>> 
>>> I can either allow this role to write only to this specific prefix or just 
>>> make a copy of the bucket and allow access to the copy. I'll try the later.
>>> 
>>> Thanks,
>>> 
>>> --Amos
>>> 
 On Wednesday, 28 January 2015 09:20:25 UTC+11, Amos S wrote:
 Thanks David,
 
 It seems that the "verify: false" setting is specific to the "fs" type and 
 not recognised by the "s3" type.
 I tried it anyway and got the same worrying "type is missing" error:
 
 $ curl -XPUT 'http://localhost:9200/_snapshot/amos0' -d '{
 "s3dev0": {
 "settings": {
 "base_path": "elasticsearch/dev/snapshots0",
 "bucket": "prod-es-backup",
 "region": "ap-southeast-1",
 "verify": "false"
 },
 "type": "s3"
 }
 }
 '
 {"error":"ActionRequestValidationException[Validation Failed: 1: type is 
 missing;]","status":400}
 
 I think I first have to address the "type is missing" issue. I suspect 
 ElasticSearch doesn't recognise the "s3" type.
 
> On Tuesday, 27 January 2015 17:13:53 UTC+11, David Pilato wrote:
> Could you try to set verify to false?
> 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html#_repositories
> 
> Not sure if it works but would love to know.
> 
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
> 
>> Le 27 janv. 2015 à 07:09, Amos S  a écrit :
>> 
>> Thanks David,
>> 
>> That would explain it.
>> 
>> Is there a way to skip the validation?
>> 
>>> On Tuesday, 27 January 2015 16:52:11 UTC+11, David Pilato wrote:
>>> IIRC when you create a repository we first try to validate it by 
>>> writing a sample file in it.
>>> 
>>> As you set it to read only, I guess it could be the cause.
>>> 
>>> --
>>> David ;-)
>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>> 
 Le 27 janv. 2015 à 06:20, Amos S  a écrit :
 
 Hello,
 
 For some investigation work, I'm trying to restore specific indices 
 from our production ES cluster to a single one-off node.
 
 We run a cluster of ES 1.4.2 on EC2, the data is stored locally on 
 each EC2 instance with snapshots stored on an S3 bucket.
 
 I've setup a one-off EC2 instance and 

Re: What causes high CPU load on ES-Storage Nodes?

2015-01-27 Thread horst knete
Hi,

thx for your response Mark.

It looks like we are getting a second big server for our 
ELK-stack(unfortunately without any more storage, so i really cant create a 
failover cluster yet), but i wonder what role i should give this server in 
our system.

Would it be good to move the whole long term storage to this server and let 
the indexing and pretty much of the kibana searching happen on the existing 
server, or are there any good configuration setups?

I read quite a bit of people that are having an similar setup like us ( 
with 1 or 2 big machines ) and would be happy if they can share there 
thoughts with us!.

Thanks guys.

Am Dienstag, 27. Januar 2015 22:54:18 UTC+1 schrieb Mark Walkom:
>
> Indexes are only refreshed when a new document is added.
> Best practices would be to use multiple machines, if you lose that one you 
> lose your cluster!
>
> Without knowing more about your cluster stats, you're probably just 
> reaching the limits of things, and either need less data, more nodes or 
> more heap.
>
> On 28 January 2015 at 02:14, horst knete  > wrote:
>
>> Hey guys,
>>
>> i´ve been around this "problem" for quite a while, and didnt got a clear 
>> answer to it.
>>
>> Like many of you guys out there, we are running an es-"cluster" on a 
>> single strong server, moving older indices from the fast SSD´s to slow 
>> cheap HDD´s (About 25 TB data).
>>
>> To make this work we got 3 instances of ES running with the path of 
>> indices set to the HDD´s mountpoint and 1 single instance for the 
>> indexing/searching SSD´s.
>>
>> What makes me wonder all the time is, that even the "Storage"-nodes dont 
>> do anything(there is no indexing happening, there is 95% of the time no 
>> searching happening, they are just keeping the old indices fresh) the cpu 
>> load caused by this "idle" nodes is about 50% from the whole cpu working 
>> time.
>>
>> hot_threads: https://gist.github.com/german23/662732ca4d9dbdcb406b
>>
>>
>> Is it possible that due to ES-Cluster mechanism, all the indexes are keep 
>> getting refreshed all the time, when a document is indexed or a search is 
>> executed.
>>
>> Are there any configuration options to avoid such an behavior?
>>
>> Would it be better to export the old indices to an separate ES-Cluster 
>> and configure multiple ES-Paths in Kibana?
>>
>> Are there any best practices to maintain such cluster?
>>
>> I would appreciate any form of feedback.
>>
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/f8ce8722-027d-4656-83ce-6c19b97e5e34%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0a577e2a-7153-4f8a-a315-7f049e136315%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticseach is not matching ”_id” between the clusters

2015-01-27 Thread Mark Walkom
They are separate nodes, but are they in the same cluster, or are they
running as their own unique clusters? ie you have two clusters of one node
each, rather than one cluster with two nodes.

On 28 January 2015 at 15:42, Carlos Henrique de Oliveira <
choliveira0...@gmail.com> wrote:

> Hi Mark,
> Yes, they are two separated nodes.
>
> We are indexing the data via PHP:
>
> public function resetListingsIndex(){
> $es = new Elasticsearch\Client();
>
> // delete listings
> $deleteParams['index'] = 'listings';
> @$es->indices()->delete($deleteParams);
>
> // create listings index
> $aElasticSearchParams=array();
>
> $aElasticSearchParams = array(
> 'index' => 'listings',
> 'body' => array(
> 'settings' => array(
> 'number_of_shards' => 1,
> 'number_of_replicas' => 0,
> ),
> 'mappings' => array(
> 'listing' => array(
> 'properties' => array(
> 'reference_number' => array(
> 'type' => 'integer',
> 'analyzer' => 'keyword',
> ),
> 'headline' => array(
> "analyzer" => "standard",
> "type" => "string",
>
> ),
> 'headline' => array(
> "analyzer" => "standard",
> "type" => "string",
>
> ),
> 'type' => array(
> 'type' => 'string',
> 'analyzer' => 'standard',
> ),
> 'state' => array(
> 'type' => 'string',
> 'analyzer' => 'standard',
>
> ),
> //all other properties...
> )
> )
> )
> )
>
> );
>
> try{
> $aResult=$es->indices()->create($aElasticSearchParams);
> if($aResult['acknowledged']==true){
> return 'Successfully indexed';
> }
> }
> catch(Exception $e){
> return $e;
> }
>
> }
>
> And yes we are using the auto ID generation within ES
>
> Em quarta-feira, 28 de janeiro de 2015 12:06:51 UTC+11, Mark Walkom
> escreveu:
>>
>> So these two nodes are their own separate clusters?
>> How are you indexing data into them? Are you using the auto ID generation
>> within ES or specifying your own?
>>
>> On 28 January 2015 at 11:56, Carlos Henrique de Oliveira <
>> cholive...@gmail.com> wrote:
>>
>>> I have two Web Servers ws001 and ws002 working as load balance for
>>> Elasticseach and I am trying to catch/count the hits for a specific page
>>> which is something like this: mysite.com/listing/item-123/.
>>>
>>> Using the ES I am running curl –XGET http: //mysite.com:9200/stats/
>>> listingviews/_search?pretty Then count the hits for a specific "_id":
>>> "hits" : { "total" : 1526, "max_score" : 1.0, "hits" : [ { "_index" :
>>> "stats", "_type" : "listingviews", "_id" : "IYSs1OmqSvK6gRDQr61j3w",
>>> "_score" : 1.0, "_source":{"id":"1159","type":"listing","ua":"Mozilla/5.0
>>> (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 
>>> Firefox/34.0","ip":"203.206.165.208","time":"2014-12-23
>>> 13:37:49"} }
>>>
>>> When I test it on my dev environment (localhost) with only one node and
>>> there is no load balance applied it works perfect, I always find this "_id"
>>> and the count works pretty good, but on my production with load balance
>>> activated I cannot find the results ("_id") on both servers ws001 and ws002
>>> and it breaks my hits counting.
>>>
>>> On my load balance servers I getting completely different results, I
>>> mean ("_id"), when I run: curl –XGET http: 
>>> //webserver1:9200/stats/listingviews/_search?pretty
>>> and curl –XGET http: //webserver2:9200/stats/listingviews/_search?pretty.
>>> Also I already checked the shads for each server and they are different.
>>>
>>> At the end, the _ids found on both servers never match and definitely
>>> they are not the same ids. I’m supposing that the replica on the load
>>> balance is not working as replica and the ES is storing the data in both
>>> servers separately. Even if the _ids are not on each server, the end result
>>> should be our ability to count how many records there are (ie. where
>>> _source->id = 1159) but it seems to only get a count from ws001.
>>>
>>> Which approach should I take to solve this issue, and be able to count
>>> my hits on the production environment?
>>>
>>> Thanks,
>>>
>>> Carlos.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on th

Re: Issues when running groovy script for "function_score"?

2015-01-27 Thread vineeth mohan
Hi ,

What is terms here ?
As far as i know , there is no provisions to get all terms for a field for
a document by default.
Only work around is to use term vectors

.

Thanks
   Vineeth Mohan,
   Elasticsearch consultant,
   qbox.io ( Elasticsearch service provider )


On Wed, Jan 28, 2015 at 9:14 AM, Panzer  wrote:

> def score = 0;
> // terms: list of tokens
> for(term in terms) {
> q_term_freq = terms​.countBy { it }​[term];
> term_freq = _index[field][term].tf();
> doc_freq = _index[field][term].df();
> score += term_freq * doc_freq * q_term_freq;
> };
> score;
> The first one gives an error 
> "GroovyScriptExecutionException[MissingPropertyException[No
> such property: terms\u200b for class: Script86". "q_term_freq" gives a
> mapping for a term to its frequency.
>
> How should I correct this?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/904247f3-5df0-4b1d-b509-80d4d606745b%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kCxRwZoyMoM%3D0NZ8vKiHqXSKm0DrT9wuEmGRaRh8W5BQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Massive perf difference with filter versus filtered query

2015-01-27 Thread Michael Giagnocavo
I'm seeing some major performance difference depending on if I wrap my filter 
in a query. I don't understand, because the docs say to use filters for exact 
matching.

This query takes about 800ms, even after repeated executions (so caches are 
hot):
{  "filter": {  "term": {  "ProjectId": 4191152 }  },
  "from": 0,  "size": 50,
  "sort": [],  "facets": {}
}

But slapping query filtered around it makes it take 5ms on repeated executions:

{  "query": { "filtered": {
  "filter": { "term": { "ProjectId": 4191152  } } }  },
  "from": 0,  "size": 50,
  "sort": [],  "facets": {}
}

What am I misunderstanding? I've got 80M documents, 30 of which match this 
query, so the only thing I can guess is that somehow when I don't use a "query" 
element at the root, Elasticsearch retrieves every document and applies my 
filter, versus using some indexed approach when using query.

-Michael

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/BLUPR07MB674B6F4B405F739E034FB1AD4330%40BLUPR07MB674.namprd07.prod.outlook.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticseach is not matching ”_id” between the clusters

2015-01-27 Thread Carlos Henrique de Oliveira
Hi Mark,
Yes, they are two separated nodes. 

We are indexing the data via PHP:

public function resetListingsIndex(){
$es = new Elasticsearch\Client();

// delete listings
$deleteParams['index'] = 'listings';
@$es->indices()->delete($deleteParams);

// create listings index
$aElasticSearchParams=array();

$aElasticSearchParams = array(
'index' => 'listings',
'body' => array(
'settings' => array(
'number_of_shards' => 1,
'number_of_replicas' => 0,
),
'mappings' => array(
'listing' => array(
'properties' => array(
'reference_number' => array(
'type' => 'integer',
'analyzer' => 'keyword',
),
'headline' => array(
"analyzer" => "standard",
"type" => "string",

),
'headline' => array(
"analyzer" => "standard",
"type" => "string",

),
'type' => array(
'type' => 'string',
'analyzer' => 'standard',
),
'state' => array(
'type' => 'string',
'analyzer' => 'standard',

),
//all other properties...
)
)
)
)

);

try{
$aResult=$es->indices()->create($aElasticSearchParams);
if($aResult['acknowledged']==true){
return 'Successfully indexed';
}
}
catch(Exception $e){
return $e;
}

}

And yes we are using the auto ID generation within ES

Em quarta-feira, 28 de janeiro de 2015 12:06:51 UTC+11, Mark Walkom 
escreveu:
>
> So these two nodes are their own separate clusters?
> How are you indexing data into them? Are you using the auto ID generation 
> within ES or specifying your own?
>
> On 28 January 2015 at 11:56, Carlos Henrique de Oliveira <
> cholive...@gmail.com > wrote:
>
>> I have two Web Servers ws001 and ws002 working as load balance for 
>> Elasticseach and I am trying to catch/count the hits for a specific page 
>> which is something like this: mysite.com/listing/item-123/.
>>
>> Using the ES I am running curl –XGET http: //
>> mysite.com:9200/stats/listingviews/_search?pretty Then count the hits 
>> for a specific "_id": "hits" : { "total" : 1526, "max_score" : 1.0, "hits" 
>> : [ { "_index" : "stats", "_type" : "listingviews", "_id" : 
>> "IYSs1OmqSvK6gRDQr61j3w", "_score" : 1.0, 
>> "_source":{"id":"1159","type":"listing","ua":"Mozilla/5.0 (Windows NT 6.1; 
>> WOW64; rv:34.0) Gecko/20100101 
>> Firefox/34.0","ip":"203.206.165.208","time":"2014-12-23 13:37:49"} }
>>
>> When I test it on my dev environment (localhost) with only one node and 
>> there is no load balance applied it works perfect, I always find this "_id" 
>> and the count works pretty good, but on my production with load balance 
>> activated I cannot find the results ("_id") on both servers ws001 and ws002 
>> and it breaks my hits counting. 
>>
>> On my load balance servers I getting completely different results, I mean 
>> ("_id"), when I run: curl –XGET http: 
>> //webserver1:9200/stats/listingviews/_search?pretty and curl –XGET http: 
>> //webserver2:9200/stats/listingviews/_search?pretty. Also I already checked 
>> the shads for each server and they are different.
>>
>> At the end, the _ids found on both servers never match and definitely 
>> they are not the same ids. I’m supposing that the replica on the load 
>> balance is not working as replica and the ES is storing the data in both 
>> servers separately. Even if the _ids are not on each server, the end result 
>> should be our ability to count how many records there are (ie. where 
>> _source->id = 1159) but it seems to only get a count from ws001.
>>
>> Which approach should I take to solve this issue, and be able to count my 
>> hits on the production environment? 
>>
>> Thanks,
>>
>> Carlos.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/9f7133f7-3bbb-4b7b-b298-d4bd125baf34%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsear

Issues when running groovy script for "function_score"?

2015-01-27 Thread Panzer
def score = 0;
// terms: list of tokens
for(term in terms) {
q_term_freq = terms​.countBy { it }​[term];
term_freq = _index[field][term].tf(); 
doc_freq = _index[field][term].df(); 
score += term_freq * doc_freq * q_term_freq;
};
score;
The first one gives an error 
"GroovyScriptExecutionException[MissingPropertyException[No such property: 
terms\u200b for class: Script86". "q_term_freq" gives a mapping for a term 
to its frequency.

How should I correct this?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/904247f3-5df0-4b1d-b509-80d4d606745b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Issues while running Groovy script for "function_score"?

2015-01-27 Thread Panzer
def score = 0;
// terms: list of tokens
for(term in terms) {
q_term_freq = terms​.countBy { it }​[term];
term_freq = _index[field][term].tf(); 
doc_freq = _index[field][term].df(); 
score += term_freq * doc_freq * q_term_freq;
};
score;
The first one gives an error 
"GroovyScriptExecutionException[MissingPropertyException[No such property: 
terms\u200b for class: Script86". "q_term_freq" gives a mapping for a term 
to its frequency.

How should I correct them?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c90e889a-f442-46b3-8132-f31be13a58f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to perform search on the qualified docs resulted from a query

2015-01-27 Thread bvnrwork
Okay Thank you ,does nested objects help here .

Is it possible to get only inner objects (from nested objects ) ?

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/nested-objects.html




On Tuesday, 27 January 2015 21:07:35 UTC-5, David Pilato wrote:
>
> You need to run 2 queries in that case IMHO.
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 28 janv. 2015 à 00:54, buddarapu nagaraju  > a écrit :
>
> Any answers for me :)?
>
> Regards
> Nagaraju
> 908 517 6981
>
> On Sun, Jan 25, 2015 at 2:14 PM, buddarapu nagaraju  > wrote:
>
>> I dont get it exactly so explaining doc structure and example docs .I 
>> understand that HasChild will get you the parent documents and HasParent 
>> will get the only parents.Please help me in understanding 
>>
>> have two document types :one is FakeDocument which is the fake document 
>> holding the group id for all docs in a group 
>> and other is Document which is the actual document
>>
>> Example Docs are:
>>
>>
>> FakeDocument{
>> Id:"G1"
>> }
>> FakeDocument{
>> Id:"G2"
>> }
>>
>> Indexed two documents under group "G1"
>>
>> Document1{
>> Name:One
>>
>> }
>>
>> Document2{
>> Name:Two
>>
>> }
>>
>>
>> Indexed two documents under group "G2"
>>
>> Document3{
>> Name:Three
>>
>> }
>>
>> Document4{
>> Name:Four
>>
>> }
>>
>>
>> Now my scenario is 
>>
>>
>> querying for "Name:One" should result me document with name :"One" and 
>> also all other documents that has same _parentId in one request 
>>
>>
>>
>>
>>
>>
>> Regards
>> Nagaraju
>> 908 517 6981
>>
>> On Sun, Jan 25, 2015 at 12:14 AM, David Pilato > > wrote:
>>
>>> If you are using Parent / Child feature, you should look at has_parent, 
>>> has_child filters.
>>>
>>>
>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html#query-dsl-has-child-filter
>>>
>>> In that case, you don't need to get back parent id yourself.
>>>
>>> If you are not using Parent / child, I'm afraid you need to run 2 
>>> queries.
>>>
>>> My 2 cents 
>>>
>>> --
>>> David ;-)
>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>>
>>> Le 25 janv. 2015 à 05:44, bvnrwork > a 
>>> écrit :
>>>
>>> Hi,
>>>
>>> can some one help me on this.
>>>
>>> have scenario where have Query1 which qualifies some documents and now I 
>>> want to take _parent id of qualified documents and search on _parentid 
>>> field to get the qualified documents and all others documents with same 
>>> parent id 
>>>
>>> .These two searches I want to do it in one single request , is it 
>>> possible?
>>>
>>> Regards,
>>> Nagaraju
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com .
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/a1946c82-7239-4e56-a082-ab11e66b4041%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>  -- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "elasticsearch" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/elasticsearch/Ye7ICf_ZUkg/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> elasticsearc...@googlegroups.com .
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/A4959B00-FB5F-4EAA-9304-5B2BCC490601%40pilato.fr
>>>  
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAFtuXX%2B2wEtz7ff-xXa%2BztvC%2B%2Bg-hpKY4s5X0%2BRVAZ8U5d7Puw%40mail.gmail.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3146fbf6-63e9-45bf-8593-90dcacf16184%40googlegroups.com.
For more opti

Re: Restoring indices from snapshot to a test server

2015-01-27 Thread Amos S

   
   1. I opened an issue for AWS plugin project on github, I hope this is 
   what you were referring to. Here is the 
   issue: https://github.com/elasticsearch/elasticsearch-cloud-aws/issues/167
   2. About the "type missing" error - it turned out to be my mistake in 
   trying to copy the output of the "GET" verbatim to the input of the PUT, it 
   turned out that I had to peel off a couple of "{}"'s. Once I did that, the 
   order of the "type" attribute in relation to the rest didn't matter and the 
   PUT succeeded. See the update I gave in a previous message.
   3. I ended up making a new copy of the bucket. I'm now trying to restore 
   it to the test node.

Thanks for your help.

--Amos

On Wednesday, 28 January 2015 13:18:24 UTC+11, David Pilato wrote:
>
> Could you open an issue in AWS plugin project (and may be in azure and 
> gce) to support verify option as well?
>
> BTW, I think we should try to support have type after settings or to 
> clearly document it needs to be on the first line. Could you open an issue 
> for this in elasticsearch?
>
> Coming back to your issue, I'm afraid you need to wait for a next cloud 
> plugin release or patch it yourself (and send a PR) or make your repo 
> writable.
>
> Best
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 27 janv. 2015 à 23:30, Amos S > a 
> écrit :
>
> OK, following previous responses by you about the "type is missing" error, 
> I corrected the JSON payload I send to the PUT and got another error:
>
> $ curl -XPUT 'http://localhost:9200/_snapshot/amos0' -d '{
>   "type":"s3",
>   "settings": {
> "region": "ap-southeast-1",
> "bucket": "prod-es-backup",
> "base_path": "elasticsearch/dev/snapshots0",
> "verify": "false"
>   }
> }'
> {"error":"RepositoryVerificationException[[amos0] path 
> [elasticsearch][dev][snapshots0] is not accessible on master node]; nested: 
> IOException[Unable to upload object 
> elasticsearch/dev/snapshots0/tests-ng9f5N6tTm6HZhrGF9s8aQ-master]; nested: 
> AmazonS3Exception[Access Denied (Service: Amazon S3; Status Code: 403; 
> Error Code: AccessDenied; Request ID: 7F3DE23BB617FFF4)]; ","status":500}
>
> I think this confirms your suspicion that the repo creation process tries 
> to verify the repository by uploading a test object onto it and also that 
> it ignores the "verify: false" setting.
>
> I can either allow this role to write only to this specific prefix or just 
> make a copy of the bucket and allow access to the copy. I'll try the later.
>
> Thanks,
>
> --Amos
>
> On Wednesday, 28 January 2015 09:20:25 UTC+11, Amos S wrote:
>>
>> Thanks David,
>>
>> It seems that the "verify: false" setting is specific to the "fs" type 
>> and not recognised by the "s3" type.
>> I tried it anyway and got the same worrying "type is missing" error:
>>
>> $ curl -XPUT 'http://localhost:9200/_snapshot/amos0' -d '{
>> "s3dev0": {
>> "settings": {
>> "base_path": "elasticsearch/dev/snapshots0",
>> "bucket": "prod-es-backup",
>> "region": "ap-southeast-1",
>> "verify": "false"
>> },
>> "type": "s3"
>> }
>> }
>> '
>> {"error":"ActionRequestValidationException[Validation Failed: 1: type is 
>> missing;]","status":400}
>>
>> I think I first have to address the "type is missing" issue. I suspect 
>> ElasticSearch doesn't recognise the "s3" type.
>>
>> On Tuesday, 27 January 2015 17:13:53 UTC+11, David Pilato wrote:
>>>
>>> Could you try to set verify to false?
>>>
>>>
>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html#_repositories
>>>
>>> Not sure if it works but would love to know.
>>>
>>> --
>>> David ;-)
>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>>
>>> Le 27 janv. 2015 à 07:09, Amos S  a écrit :
>>>
>>> Thanks David,
>>>
>>> That would explain it.
>>>
>>> Is there a way to skip the validation?
>>>
>>> On Tuesday, 27 January 2015 16:52:11 UTC+11, David Pilato wrote:

 IIRC when you create a repository we first try to validate it by 
 writing a sample file in it.

 As you set it to read only, I guess it could be the cause.

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 27 janv. 2015 à 06:20, Amos S  a écrit :

 Hello,

 For some investigation work, I'm trying to restore specific indices 
 from our production ES cluster to a single one-off node.

 We run a cluster of ES 1.4.2 on EC2, the data is stored locally on each 
 EC2 instance with snapshots stored on an S3 bucket.

 I've setup a one-off EC2 instance and am trying to restore a single 
 index from snapshot into that new instance.

 The instance has its own cluster name and node name, and I've setup a 
 read-only S3 role for it so it doesn't accidentally overwrite our backup.

 Trying to follow instructions I found in various locations on the web, 
>

Re: Restoring indices from snapshot to a test server

2015-01-27 Thread David Pilato
Could you open an issue in AWS plugin project (and may be in azure and gce) to 
support verify option as well?

BTW, I think we should try to support have type after settings or to clearly 
document it needs to be on the first line. Could you open an issue for this in 
elasticsearch?

Coming back to your issue, I'm afraid you need to wait for a next cloud plugin 
release or patch it yourself (and send a PR) or make your repo writable.

Best

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 27 janv. 2015 à 23:30, Amos S  a écrit :
> 
> OK, following previous responses by you about the "type is missing" error, I 
> corrected the JSON payload I send to the PUT and got another error:
> 
> $ curl -XPUT 'http://localhost:9200/_snapshot/amos0' -d '{
>   "type":"s3",
>   "settings": {
> "region": "ap-southeast-1",
> "bucket": "prod-es-backup",
> "base_path": "elasticsearch/dev/snapshots0",
> "verify": "false"
>   }
> }'
> {"error":"RepositoryVerificationException[[amos0] path 
> [elasticsearch][dev][snapshots0] is not accessible on master node]; nested: 
> IOException[Unable to upload object 
> elasticsearch/dev/snapshots0/tests-ng9f5N6tTm6HZhrGF9s8aQ-master]; nested: 
> AmazonS3Exception[Access Denied (Service: Amazon S3; Status Code: 403; Error 
> Code: AccessDenied; Request ID: 7F3DE23BB617FFF4)]; ","status":500}
> 
> I think this confirms your suspicion that the repo creation process tries to 
> verify the repository by uploading a test object onto it and also that it 
> ignores the "verify: false" setting.
> 
> I can either allow this role to write only to this specific prefix or just 
> make a copy of the bucket and allow access to the copy. I'll try the later.
> 
> Thanks,
> 
> --Amos
> 
>> On Wednesday, 28 January 2015 09:20:25 UTC+11, Amos S wrote:
>> Thanks David,
>> 
>> It seems that the "verify: false" setting is specific to the "fs" type and 
>> not recognised by the "s3" type.
>> I tried it anyway and got the same worrying "type is missing" error:
>> 
>> $ curl -XPUT 'http://localhost:9200/_snapshot/amos0' -d '{
>> "s3dev0": {
>> "settings": {
>> "base_path": "elasticsearch/dev/snapshots0",
>> "bucket": "prod-es-backup",
>> "region": "ap-southeast-1",
>> "verify": "false"
>> },
>> "type": "s3"
>> }
>> }
>> '
>> {"error":"ActionRequestValidationException[Validation Failed: 1: type is 
>> missing;]","status":400}
>> 
>> I think I first have to address the "type is missing" issue. I suspect 
>> ElasticSearch doesn't recognise the "s3" type.
>> 
>>> On Tuesday, 27 January 2015 17:13:53 UTC+11, David Pilato wrote:
>>> Could you try to set verify to false?
>>> 
>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html#_repositories
>>> 
>>> Not sure if it works but would love to know.
>>> 
>>> --
>>> David ;-)
>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>> 
 Le 27 janv. 2015 à 07:09, Amos S  a écrit :
 
 Thanks David,
 
 That would explain it.
 
 Is there a way to skip the validation?
 
> On Tuesday, 27 January 2015 16:52:11 UTC+11, David Pilato wrote:
> IIRC when you create a repository we first try to validate it by writing 
> a sample file in it.
> 
> As you set it to read only, I guess it could be the cause.
> 
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
> 
>> Le 27 janv. 2015 à 06:20, Amos S  a écrit :
>> 
>> Hello,
>> 
>> For some investigation work, I'm trying to restore specific indices from 
>> our production ES cluster to a single one-off node.
>> 
>> We run a cluster of ES 1.4.2 on EC2, the data is stored locally on each 
>> EC2 instance with snapshots stored on an S3 bucket.
>> 
>> I've setup a one-off EC2 instance and am trying to restore a single 
>> index from snapshot into that new instance.
>> 
>> The instance has its own cluster name and node name, and I've setup a 
>> read-only S3 role for it so it doesn't accidentally overwrite our backup.
>> 
>> Trying to follow instructions I found in various locations on the web, I 
>> think the next step for me is to configure the S3 snapshot bucket as a 
>> repository on the new instance, is that correct?
>> 
>> So I did the following to find the S3 snapshot repository configuration 
>> in the production environment:
>> 
>> $ curl -XGET http://production-cluster:9200/_snapshot/ | python 
>> -mjson.tool
>> {
>> "s3prod0": {
>> "settings": {
>> "base_path": "elasticsearch/prod/snapshots0",
>> "bucket": "prod-es-backup",
>> "region": "ap-southeast-1"
>> },
>> "type": "s3"
>> }
>> }
>> 
>> I then tried to feed this into my new node's configuration:
>> 
>> $ curl 

Re: How to perform search on the qualified docs resulted from a query

2015-01-27 Thread David Pilato
You need to run 2 queries in that case IMHO.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 28 janv. 2015 à 00:54, buddarapu nagaraju  a écrit :
> 
> Any answers for me :)?
> 
> Regards
> Nagaraju
> 908 517 6981
> 
>> On Sun, Jan 25, 2015 at 2:14 PM, buddarapu nagaraju  
>> wrote:
>> I dont get it exactly so explaining doc structure and example docs .I 
>> understand that HasChild will get you the parent documents and HasParent 
>> will get the only parents.Please help me in understanding 
>> 
>> have two document types :one is FakeDocument which is the fake document 
>> holding the group id for all docs in a group 
>> and other is Document which is the actual document
>> 
>> Example Docs are:
>> 
>> 
>> FakeDocument{
>> Id:"G1"
>> }
>> FakeDocument{
>> Id:"G2"
>> }
>> 
>> Indexed two documents under group "G1"
>> 
>> Document1{
>> Name:One
>> 
>> }
>> 
>> Document2{
>> Name:Two
>> 
>> }
>> 
>> 
>> Indexed two documents under group "G2"
>> 
>> Document3{
>> Name:Three
>> 
>> }
>> 
>> Document4{
>> Name:Four
>> 
>> }
>> 
>> 
>> Now my scenario is 
>> 
>> 
>> querying for "Name:One" should result me document with name :"One" and also 
>> all other documents that has same _parentId in one request 
>> 
>> 
>> 
>> 
>> 
>> 
>> Regards
>> Nagaraju
>> 908 517 6981
>> 
>>> On Sun, Jan 25, 2015 at 12:14 AM, David Pilato  wrote:
>>> If you are using Parent / Child feature, you should look at has_parent, 
>>> has_child filters.
>>> 
>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html#query-dsl-has-child-filter
>>> 
>>> In that case, you don't need to get back parent id yourself.
>>> 
>>> If you are not using Parent / child, I'm afraid you need to run 2 queries.
>>> 
>>> My 2 cents 
>>> 
>>> --
>>> David ;-)
>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>> 
 Le 25 janv. 2015 à 05:44, bvnrwork  a écrit :
 
 Hi,
 
 can some one help me on this.
 
 have scenario where have Query1 which qualifies some documents and now I 
 want to take _parent id of qualified documents and search on _parentid 
 field to get the qualified documents and all others documents with same 
 parent id 
 
 .These two searches I want to do it in one single request , is it possible?
 
 Regards,
 Nagaraju
 -- 
 You received this message because you are subscribed to the Google Groups 
 "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a1946c82-7239-4e56-a082-ab11e66b4041%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "elasticsearch" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/elasticsearch/Ye7ICf_ZUkg/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/A4959B00-FB5F-4EAA-9304-5B2BCC490601%40pilato.fr.
>>> 
>>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAFtuXX%2B2wEtz7ff-xXa%2BztvC%2B%2Bg-hpKY4s5X0%2BRVAZ8U5d7Puw%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/70D06388-5173-4225-97A7-9BB78A78A096%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Some text around query hit

2015-01-27 Thread David Pilato
Not sure if it's what you are looking for.

Highlighting?



--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 28 janv. 2015 à 00:56, bvnrwork  a écrit :
> 
> Hi, 
> 
> we have a scenario where we need to display small part of the document text 
> when document qualifies .
> 
> Any ideas would be appreciated 
> 
> Regards,
> Nagaraju
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/f1b9dfc8-0ccd-4b36-8273-42bebf01fd4e%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/36B7CA58-0B09-4F28-BC61-27DC27E8B019%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticseach is not matching ”_id” between the clusters

2015-01-27 Thread Mark Walkom
So these two nodes are their own separate clusters?
How are you indexing data into them? Are you using the auto ID generation
within ES or specifying your own?

On 28 January 2015 at 11:56, Carlos Henrique de Oliveira <
choliveira0...@gmail.com> wrote:

> I have two Web Servers ws001 and ws002 working as load balance for
> Elasticseach and I am trying to catch/count the hits for a specific page
> which is something like this: mysite.com/listing/item-123/.
>
> Using the ES I am running curl –XGET http: //
> mysite.com:9200/stats/listingviews/_search?pretty Then count the hits for
> a specific "_id": "hits" : { "total" : 1526, "max_score" : 1.0, "hits" : [
> { "_index" : "stats", "_type" : "listingviews", "_id" :
> "IYSs1OmqSvK6gRDQr61j3w", "_score" : 1.0,
> "_source":{"id":"1159","type":"listing","ua":"Mozilla/5.0 (Windows NT 6.1;
> WOW64; rv:34.0) Gecko/20100101
> Firefox/34.0","ip":"203.206.165.208","time":"2014-12-23 13:37:49"} }
>
> When I test it on my dev environment (localhost) with only one node and
> there is no load balance applied it works perfect, I always find this "_id"
> and the count works pretty good, but on my production with load balance
> activated I cannot find the results ("_id") on both servers ws001 and ws002
> and it breaks my hits counting.
>
> On my load balance servers I getting completely different results, I mean
> ("_id"), when I run: curl –XGET http:
> //webserver1:9200/stats/listingviews/_search?pretty and curl –XGET http:
> //webserver2:9200/stats/listingviews/_search?pretty. Also I already checked
> the shads for each server and they are different.
>
> At the end, the _ids found on both servers never match and definitely they
> are not the same ids. I’m supposing that the replica on the load balance is
> not working as replica and the ES is storing the data in both servers
> separately. Even if the _ids are not on each server, the end result should
> be our ability to count how many records there are (ie. where _source->id =
> 1159) but it seems to only get a count from ws001.
>
> Which approach should I take to solve this issue, and be able to count my
> hits on the production environment?
>
> Thanks,
>
> Carlos.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/9f7133f7-3bbb-4b7b-b298-d4bd125baf34%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-ZdMpsESDq8Dm6MurwmNX2_xS0oTTHjbMr1Y6KSewViw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasricsearch Documentation on Functional Big O Notation

2015-01-27 Thread Mark Walkom
As with most performance related things in Elasticsearch context, it
depends on too many factors to really provide set figures.

On 28 January 2015 at 11:05, webish  wrote:

> I have often found myself looking into the performance of different
> functionality with Elasticsearch. I feel like this is a huge missing piece
> of the documentation with ES.  The Redis documentation attempts to identify
> performance of functionality using Big O Notation or what they refer to as
> time complexity.
>
> Is there any sort of documentation like this that I'm missing?
>
> It would be great to have performance details in the documentation with
> regards to functionality and design decisions...
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7f4607f8-0f5c-43d4-8500-1bc99fc1a0c1%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_W9JLGa1MNG1-wO1D6mvBSLbwohb2%2Bsb0GkYHv9xjWsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Importing Large Amounts of Data to Production Indices

2015-01-27 Thread Mark Walkom
How much data are you talking? Are you using bulk API? What is your bulk
sizing?

You can also set an index to not refresh while you ingest it (refresh =
-1), then once it's been sent to ES turn indexing back on.

On 28 January 2015 at 11:45, webish  wrote:

> I have some production indices that needs to a large amount of data
> imported into them fairly frequently.  Each time we import data the ES
> nodes become a huge bottleneck.  I honestly expected a lot better
> performance out of them.  Regardless, I would like to import data in a
> production ES setup with the least amount of interruption or performance
> issues.
>
> What are some options I can take to import large quantities of data
> without affecting data that is already being used by applications?
>
> I was thinking I could use a combination of aliases or temp indices to
> migrate the data over...
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/410c454e-7e8d-4f1b-b70a-68e18fa7c732%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8MuHkUdVoznQAiZFVx45nqhNngqGRrw-NxiSZH6opAvg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shard size / Index number / server count and performance

2015-01-27 Thread Mark Walkom
Be aware that we do not yet officially support G1GC. You should also reduce
your heap to 31GB.

Ideally you want to keep shard size below 50GB, so you will need to adjust
things as you grow. Be careful creating a lot of indices though, each one
takes overhead and if you increase the number of indices and the amount of
data you have in them you could be wasting resources.

However when querying, 100 indices with 1 shard is the same as 1 index with
100 shards.

On 28 January 2015 at 10:11, Chris Neal  wrote:

> Hi all,
>
> I've seen lots of posts about this, and want to make sure I'm
> understanding correctly.
>
> Background:
>
>- Our cluster has 6 servers.  They are Dell R720xd with 64GB RAM,
>2xE5-2600v2 CPU (2 sockets, 6 cores/socket), 16TB disk
>- Elasticsearch is set to have 6 shards, and 1 replica, giving two
>shards per server.  I'm giving ES 32GB heaps on Java 1.7 with G1 GC.
>
>
> I'm concerned about the size of our indexes.  Right now, we store all data
> in one index per day, with various types within that to separate data.
>
> The indexes are averaging about 50GB/day (not including replicas).  Shard
> size is 8GB each.
>
> We have a LOT more data to index.  At least 20x more.  Should I be
> concerned with indexes of that size (~1000GB) and shards of that size
> (~160GB)?  Is it merely a question of having enough hardware, or is there
> more to it?
>
> I'm considering splitting the data into a different indexing strategy so
> that the index size is smaller, but there are more of them.  The result is
> the amount of data is the same, so I'm not sure if that will do anything or
> not.
>
> If I'm optimizing for searching, does querying multiple smaller indices
> perform better than querying fewer larger ones?
>
> Thank you for your time.
> Chris
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAND3Dpgr78LJ%3DcWb0ZbyHZqMin4tDSVPvjG%3D_PYgsQym9EzZ%3Dg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9Tmoc20khrdn85eO%2B7eptq0SNGwUd1-6XfBoH0cs8-Hw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Elasticseach is not matching ”_id” between the clusters

2015-01-27 Thread Carlos Henrique de Oliveira
 

I have two Web Servers ws001 and ws002 working as load balance for 
Elasticseach and I am trying to catch/count the hits for a specific page 
which is something like this: mysite.com/listing/item-123/.

Using the ES I am running curl –XGET http: 
//mysite.com:9200/stats/listingviews/_search?pretty Then count the hits for 
a specific "_id": "hits" : { "total" : 1526, "max_score" : 1.0, "hits" : [ 
{ "_index" : "stats", "_type" : "listingviews", "_id" : 
"IYSs1OmqSvK6gRDQr61j3w", "_score" : 1.0, 
"_source":{"id":"1159","type":"listing","ua":"Mozilla/5.0 (Windows NT 6.1; 
WOW64; rv:34.0) Gecko/20100101 
Firefox/34.0","ip":"203.206.165.208","time":"2014-12-23 13:37:49"} }

When I test it on my dev environment (localhost) with only one node and 
there is no load balance applied it works perfect, I always find this "_id" 
and the count works pretty good, but on my production with load balance 
activated I cannot find the results ("_id") on both servers ws001 and ws002 
and it breaks my hits counting. 

On my load balance servers I getting completely different results, I mean 
("_id"), when I run: curl –XGET http: 
//webserver1:9200/stats/listingviews/_search?pretty and curl –XGET http: 
//webserver2:9200/stats/listingviews/_search?pretty. Also I already checked 
the shads for each server and they are different.

At the end, the _ids found on both servers never match and definitely they 
are not the same ids. I’m supposing that the replica on the load balance is 
not working as replica and the ES is storing the data in both servers 
separately. Even if the _ids are not on each server, the end result should 
be our ability to count how many records there are (ie. where _source->id = 
1159) but it seems to only get a count from ws001.

Which approach should I take to solve this issue, and be able to count my 
hits on the production environment? 

Thanks,

Carlos.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9f7133f7-3bbb-4b7b-b298-d4bd125baf34%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Importing Large Amounts of Data to Production Indices

2015-01-27 Thread webish
I have some production indices that needs to a large amount of data 
imported into them fairly frequently.  Each time we import data the ES 
nodes become a huge bottleneck.  I honestly expected a lot better 
performance out of them.  Regardless, I would like to import data in a 
production ES setup with the least amount of interruption or performance 
issues.

What are some options I can take to import large quantities of data without 
affecting data that is already being used by applications?

I was thinking I could use a combination of aliases or temp indices to 
migrate the data over...

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/410c454e-7e8d-4f1b-b70a-68e18fa7c732%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Performance with n Indices in Aliases

2015-01-27 Thread webish
I was wondering what is going on behind the scenes when adding n number of 
indices to an alias.  Are there any performance implications?

So an alias with a single index that has a single shard will allocate a 
single process to scan the index...  So that would mean, with the same 
data, when having two indices each with a single shard in an alias it would 
require two processes in parallel to scan the index.  The key being the 
scan time is but in ~half due since there are two processes scanning half 
the data.  This scales to the point of thread saturation.  The performance 
is then affected by number and power of cores...

I'm wondering if there is a significant difference performance in using an 
alias with 100 indices or even 10,000 vs an alias with a single index (on a 
single core machine)?

What impacts are there to the performance and considerations one must take? 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/09e3678a-92f7-4f26-885a-4f461a49f318%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasricsearch Documentation on Functional Big O Notation

2015-01-27 Thread webish
I have often found myself looking into the performance of different 
functionality with Elasticsearch. I feel like this is a huge missing piece 
of the documentation with ES.  The Redis documentation attempts to identify 
performance of functionality using Big O Notation or what they refer to as 
time complexity.

Is there any sort of documentation like this that I'm missing?

It would be great to have performance details in the documentation with 
regards to functionality and design decisions...

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f4607f8-0f5c-43d4-8500-1bc99fc1a0c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


First production schema issues - ES beginner

2015-01-27 Thread Alexandru-Emil Lupu
Hello! 

I am trying to create an elasticsearch index in order to achieve my goals. 
The main problem of the task is it's complexity, and after 3 days of tries, 
retries etc, i am turning to this group for suggestions: 

I want to create a statistics page that would allow me to do following 
things:

As a forum owner, i would like to search and filter (only) my users, based 
on their activity or their status regarding the forum application, and of 
course i should be able to see some data aggregation (activity within 30 
days, 7 days ) etc. 

My database structure looks like : 
https://gist.github.com/alecslupu/168c0f62f948d633378b

I have tried to create the index the user model, but for some reason, it 
get's fuzzy when it comes to filter the results, as i would always need to 
filter based on a forum id, and if needed to orfder by the nested status in 
the user_forums  association.

I have thought to create it on user_forum model, but i would end up, with 
having multiple duplicates for user record. 

Any ideea on how could i structure that, given the fact that all the infos 
might become a facet? 

Thanks in advance
Alex

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/42108a22-734c-4601-af45-a1a0745c07ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES_HEAP_SIZE is set but still see -Xms256m -Xmx1g

2015-01-27 Thread Mark Walkom
If you are installing using the deb, just use /etc/default/elasticsearch to
set it.

On 28 January 2015 at 09:48, Ali Kheyrollahi  wrote:

> Sorry missed *echo* $ES_HEAP_SIZE
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/65b8ce31-c0df-4cd3-a85c-99cb563becae%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8oM7MtKZZ1gHDi5_9UgNRKsXa%2B4%3DHVig97Y0Wmud3WkQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Some text around query hit

2015-01-27 Thread bvnrwork
Hi, 

we have a scenario where we need to display small part of the document text 
when document qualifies .

Any ideas would be appreciated 

Regards,
Nagaraju

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f1b9dfc8-0ccd-4b36-8273-42bebf01fd4e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to perform search on the qualified docs resulted from a query

2015-01-27 Thread buddarapu nagaraju
Any answers for me :)?

Regards
Nagaraju
908 517 6981

On Sun, Jan 25, 2015 at 2:14 PM, buddarapu nagaraju 
wrote:

> I dont get it exactly so explaining doc structure and example docs .I
> understand that HasChild will get you the parent documents and HasParent
> will get the only parents.Please help me in understanding
>
> have two document types :one is FakeDocument which is the fake document
> holding the group id for all docs in a group
> and other is Document which is the actual document
>
> Example Docs are:
>
>
> FakeDocument{
> Id:"G1"
> }
> FakeDocument{
> Id:"G2"
> }
>
> Indexed two documents under group "G1"
>
> Document1{
> Name:One
>
> }
>
> Document2{
> Name:Two
>
> }
>
>
> Indexed two documents under group "G2"
>
> Document3{
> Name:Three
>
> }
>
> Document4{
> Name:Four
>
> }
>
>
> Now my scenario is
>
>
> querying for "Name:One" should result me document with name :"One" and
> also all other documents that has same _parentId in one request
>
>
>
>
>
>
> Regards
> Nagaraju
> 908 517 6981
>
> On Sun, Jan 25, 2015 at 12:14 AM, David Pilato  wrote:
>
>> If you are using Parent / Child feature, you should look at has_parent,
>> has_child filters.
>>
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html#query-dsl-has-child-filter
>>
>> In that case, you don't need to get back parent id yourself.
>>
>> If you are not using Parent / child, I'm afraid you need to run 2 queries.
>>
>> My 2 cents
>>
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>
>> Le 25 janv. 2015 à 05:44, bvnrwork  a écrit :
>>
>> Hi,
>>
>> can some one help me on this.
>>
>> have scenario where have Query1 which qualifies some documents and now I
>> want to take _parent id of qualified documents and search on _parentid
>> field to get the qualified documents and all others documents with same
>> parent id
>>
>> .These two searches I want to do it in one single request , is it
>> possible?
>>
>> Regards,
>> Nagaraju
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/a1946c82-7239-4e56-a082-ab11e66b4041%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/Ye7ICf_ZUkg/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/A4959B00-FB5F-4EAA-9304-5B2BCC490601%40pilato.fr
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFtuXX%2B2wEtz7ff-xXa%2BztvC%2B%2Bg-hpKY4s5X0%2BRVAZ8U5d7Puw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


ES FILTERED QUERY filers first with terms (cheap) and then match(expensive), or otherwise?

2015-01-27 Thread Gabriel Gavilan gavilán
So I suppose that when I run a filtered query like this one, ES filters all 
the documents in the database, and then performs the match query only to 
the documents that fit the filter, right? I just want to make sure that it 
doesn't perform the match query on all the documents and the drop the ones 
that don't fit the filter, because that would be a total waste of resources 
as I'm only interested on a small portion of the documents.

{
  "query" : {
"filtered" : {
  "query": {
"match": {
  "field1": "whatever"
}
  },
  "filter" : {
"term" : {
  "field2" : 10
}
  }
} 
  }
}

Thanks for clarifying this!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ee8f3294-9761-4d7b-b8b1-e49943d88ae7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Shard size / Index number / server count and performance

2015-01-27 Thread Chris Neal
Hi all,

I've seen lots of posts about this, and want to make sure I'm understanding
correctly.

Background:

   - Our cluster has 6 servers.  They are Dell R720xd with 64GB RAM,
   2xE5-2600v2 CPU (2 sockets, 6 cores/socket), 16TB disk
   - Elasticsearch is set to have 6 shards, and 1 replica, giving two
   shards per server.  I'm giving ES 32GB heaps on Java 1.7 with G1 GC.


I'm concerned about the size of our indexes.  Right now, we store all data
in one index per day, with various types within that to separate data.

The indexes are averaging about 50GB/day (not including replicas).  Shard
size is 8GB each.

We have a LOT more data to index.  At least 20x more.  Should I be
concerned with indexes of that size (~1000GB) and shards of that size
(~160GB)?  Is it merely a question of having enough hardware, or is there
more to it?

I'm considering splitting the data into a different indexing strategy so
that the index size is smaller, but there are more of them.  The result is
the amount of data is the same, so I'm not sure if that will do anything or
not.

If I'm optimizing for searching, does querying multiple smaller indices
perform better than querying fewer larger ones?

Thank you for your time.
Chris

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpgr78LJ%3DcWb0ZbyHZqMin4tDSVPvjG%3D_PYgsQym9EzZ%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parent child documents query

2015-01-27 Thread Perryn Fowler
You should be able to query the child type with a has_parent query which
has a has_child query nested within it.

No idea how it would perform though.

On Sun, Jan 25, 2015 at 3:29 AM, bvnrwork  wrote:

> For example:
>
> Have three below documents , FakeDoc,Doc1&Doc2
>
> Now how to write a query that qualifies Doc1 and also gets the all
> documents which has same parentid as Doc1
>
> That is Doc1 and Doc2 in this case
>
>
> FakeDoc{
>
> F1
>
> }
>
> Doc1
>
> {
>
> _parent:F1
>
> }
>
>
>
> Doc2
>
> {
>
> _parent:F2
>
> }
>
> On Friday, 23 January 2015 15:22:49 UTC-5, bvnrwork wrote:
>>
>>
>> Is there a way we can get all child's and parents if parent /child
>> qualifies for a query
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/32a3ac9e-409b-46df-af50-4ed6d8048bbc%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFps6aBkkAUe%2BMqHKhJccnVPwKLTZBqEVY%2B3sBo96YenO_CRnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES_HEAP_SIZE is set but still see -Xms256m -Xmx1g

2015-01-27 Thread Ali Kheyrollahi
Sorry missed *echo* $ES_HEAP_SIZE

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/65b8ce31-c0df-4cd3-a85c-99cb563becae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ES_HEAP_SIZE is set but still see -Xms256m -Xmx1g

2015-01-27 Thread Ali Kheyrollahi
Hi,

I have an ES cluster running on Ubuntu 14 and created a file in 
/etc/profile.d/es_vars.sh with this content:

export ES_HEAP_SIZE=7g

I have 14GB of memory so giving 7GB to ES heap but I can see in ps aux:

...
elastic+  1474 17.0  2.2 5929120 325284 ?  Sl   22:33   0:38 
/usr/lib/jvm/java-7-oracle/bin/java *-Xms256m -Xmx1g* -Xss256k 
-Djava.awt.headless=true -XX:+UseParNewGC -
...

So it seems it is still running with Xms 256MB and Xmx of 1GB. I also once 
got *memory circuit breaker* for using 600MB RAM for fields so *confirming 
that my Max memory is only 1GB*.

I am sure environment variable is there:

$ES_HEAP_SIZE
$7g

Is there something I am missing?

Thanks in advance

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6105e526-f399-4324-8d15-843eb4881c39%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Restoring indices from snapshot to a test server

2015-01-27 Thread Amos S
OK, following previous responses by you about the "type is missing" error, 
I corrected the JSON payload I send to the PUT and got another error:

$ curl -XPUT 'http://localhost:9200/_snapshot/amos0' -d '{
  "type":"s3",
  "settings": {
"region": "ap-southeast-1",
"bucket": "prod-es-backup",
"base_path": "elasticsearch/dev/snapshots0",
"verify": "false"
  }
}'
{"error":"RepositoryVerificationException[[amos0] path 
[elasticsearch][dev][snapshots0] is not accessible on master node]; nested: 
IOException[Unable to upload object 
elasticsearch/dev/snapshots0/tests-ng9f5N6tTm6HZhrGF9s8aQ-master]; nested: 
AmazonS3Exception[Access Denied (Service: Amazon S3; Status Code: 403; 
Error Code: AccessDenied; Request ID: 7F3DE23BB617FFF4)]; ","status":500}

I think this confirms your suspicion that the repo creation process tries 
to verify the repository by uploading a test object onto it and also that 
it ignores the "verify: false" setting.

I can either allow this role to write only to this specific prefix or just 
make a copy of the bucket and allow access to the copy. I'll try the later.

Thanks,

--Amos

On Wednesday, 28 January 2015 09:20:25 UTC+11, Amos S wrote:
>
> Thanks David,
>
> It seems that the "verify: false" setting is specific to the "fs" type and 
> not recognised by the "s3" type.
> I tried it anyway and got the same worrying "type is missing" error:
>
> $ curl -XPUT 'http://localhost:9200/_snapshot/amos0' -d '{
> "s3dev0": {
> "settings": {
> "base_path": "elasticsearch/dev/snapshots0",
> "bucket": "prod-es-backup",
> "region": "ap-southeast-1",
> "verify": "false"
> },
> "type": "s3"
> }
> }
> '
> {"error":"ActionRequestValidationException[Validation Failed: 1: type is 
> missing;]","status":400}
>
> I think I first have to address the "type is missing" issue. I suspect 
> ElasticSearch doesn't recognise the "s3" type.
>
> On Tuesday, 27 January 2015 17:13:53 UTC+11, David Pilato wrote:
>>
>> Could you try to set verify to false?
>>
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html#_repositories
>>
>> Not sure if it works but would love to know.
>>
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>
>> Le 27 janv. 2015 à 07:09, Amos S  a écrit :
>>
>> Thanks David,
>>
>> That would explain it.
>>
>> Is there a way to skip the validation?
>>
>> On Tuesday, 27 January 2015 16:52:11 UTC+11, David Pilato wrote:
>>>
>>> IIRC when you create a repository we first try to validate it by writing 
>>> a sample file in it.
>>>
>>> As you set it to read only, I guess it could be the cause.
>>>
>>> --
>>> David ;-)
>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>>
>>> Le 27 janv. 2015 à 06:20, Amos S  a écrit :
>>>
>>> Hello,
>>>
>>> For some investigation work, I'm trying to restore specific indices from 
>>> our production ES cluster to a single one-off node.
>>>
>>> We run a cluster of ES 1.4.2 on EC2, the data is stored locally on each 
>>> EC2 instance with snapshots stored on an S3 bucket.
>>>
>>> I've setup a one-off EC2 instance and am trying to restore a single 
>>> index from snapshot into that new instance.
>>>
>>> The instance has its own cluster name and node name, and I've setup a 
>>> read-only S3 role for it so it doesn't accidentally overwrite our backup.
>>>
>>> Trying to follow instructions I found in various locations on the web, I 
>>> think the next step for me is to configure the S3 snapshot bucket as a 
>>> repository on the new instance, is that correct?
>>>
>>> So I did the following to find the S3 snapshot repository configuration 
>>> in the production environment:
>>>
>>> $ curl -XGET http://production-cluster:9200/_snapshot/ | python 
>>> -mjson.tool
>>> {
>>> "s3prod0": {
>>> "settings": {
>>> "base_path": "elasticsearch/prod/snapshots0",
>>> "bucket": "prod-es-backup",
>>> "region": "ap-southeast-1"
>>> },
>>> "type": "s3"
>>> }
>>> }
>>>
>>> I then tried to feed this into my new node's configuration:
>>>
>>> $ curl -XPUT 'http://localhost:9200/_snapshot/amos0' -d '
>>> {
>>> "type": "s3",
>>> "s3prod0": {
>>> "settings": {
>>> "base_path": "elasticsearch/prod/snapshots0",
>>> "bucket": "prod-es-backup",
>>> "region": "ap-southeast-1"
>>> }
>>> }
>>> }
>>> '
>>> {"error":"RepositoryException[[amos0] failed to create repository]; 
>>> nested: CreationException[Guice creation errors:\n\n1) Error injecting 
>>> constructor, org.elasticsearch.repositories.RepositoryException: [amos0] No 
>>> bucket defined for s3 gateway\n  at 
>>> org.elasticsearch.repositories.s3.S3Repository.(Unknown Source)\n 
>>>  while locating org.elasticsearch.repositories.s3.S3Repository\n  while 
>>> locating org.elasticsearch.repositories.Repository\n\n1 error]; nested: 
>>> RepositoryExceptio

Re: Restoring indices from snapshot to a test server

2015-01-27 Thread Amos S
Thanks David,

It seems that the "verify: false" setting is specific to the "fs" type and 
not recognised by the "s3" type.
I tried it anyway and got the same worrying "type is missing" error:

$ curl -XPUT 'http://localhost:9200/_snapshot/amos0' -d '{
"s3dev0": {
"settings": {
"base_path": "elasticsearch/dev/snapshots0",
"bucket": "prod-es-backup",
"region": "ap-southeast-1",
"verify": "false"
},
"type": "s3"
}
}
'
{"error":"ActionRequestValidationException[Validation Failed: 1: type is 
missing;]","status":400}

I think I first have to address the "type is missing" issue. I suspect 
ElasticSearch doesn't recognise the "s3" type.

On Tuesday, 27 January 2015 17:13:53 UTC+11, David Pilato wrote:
>
> Could you try to set verify to false?
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html#_repositories
>
> Not sure if it works but would love to know.
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 27 janv. 2015 à 07:09, Amos S > a 
> écrit :
>
> Thanks David,
>
> That would explain it.
>
> Is there a way to skip the validation?
>
> On Tuesday, 27 January 2015 16:52:11 UTC+11, David Pilato wrote:
>>
>> IIRC when you create a repository we first try to validate it by writing 
>> a sample file in it.
>>
>> As you set it to read only, I guess it could be the cause.
>>
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>
>> Le 27 janv. 2015 à 06:20, Amos S  a écrit :
>>
>> Hello,
>>
>> For some investigation work, I'm trying to restore specific indices from 
>> our production ES cluster to a single one-off node.
>>
>> We run a cluster of ES 1.4.2 on EC2, the data is stored locally on each 
>> EC2 instance with snapshots stored on an S3 bucket.
>>
>> I've setup a one-off EC2 instance and am trying to restore a single index 
>> from snapshot into that new instance.
>>
>> The instance has its own cluster name and node name, and I've setup a 
>> read-only S3 role for it so it doesn't accidentally overwrite our backup.
>>
>> Trying to follow instructions I found in various locations on the web, I 
>> think the next step for me is to configure the S3 snapshot bucket as a 
>> repository on the new instance, is that correct?
>>
>> So I did the following to find the S3 snapshot repository configuration 
>> in the production environment:
>>
>> $ curl -XGET http://production-cluster:9200/_snapshot/ | python 
>> -mjson.tool
>> {
>> "s3prod0": {
>> "settings": {
>> "base_path": "elasticsearch/prod/snapshots0",
>> "bucket": "prod-es-backup",
>> "region": "ap-southeast-1"
>> },
>> "type": "s3"
>> }
>> }
>>
>> I then tried to feed this into my new node's configuration:
>>
>> $ curl -XPUT 'http://localhost:9200/_snapshot/amos0' -d '
>> {
>> "type": "s3",
>> "s3prod0": {
>> "settings": {
>> "base_path": "elasticsearch/prod/snapshots0",
>> "bucket": "prod-es-backup",
>> "region": "ap-southeast-1"
>> }
>> }
>> }
>> '
>> {"error":"RepositoryException[[amos0] failed to create repository]; 
>> nested: CreationException[Guice creation errors:\n\n1) Error injecting 
>> constructor, org.elasticsearch.repositories.RepositoryException: [amos0] No 
>> bucket defined for s3 gateway\n  at 
>> org.elasticsearch.repositories.s3.S3Repository.(Unknown Source)\n 
>>  while locating org.elasticsearch.repositories.s3.S3Repository\n  while 
>> locating org.elasticsearch.repositories.Repository\n\n1 error]; nested: 
>> RepositoryException[[amos0] No bucket defined for s3 gateway]; 
>> ","status":500}
>>
>> What am I missing here?
>>
>> Thanks.
>>
>> --Amos
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/e3576b32-02c2-4e05-92de-2a1e5de285d2%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/facc1f6f-1626-47be-bd18-28d02a76380e%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout

Re: What causes high CPU load on ES-Storage Nodes?

2015-01-27 Thread Mark Walkom
Indexes are only refreshed when a new document is added.
Best practices would be to use multiple machines, if you lose that one you
lose your cluster!

Without knowing more about your cluster stats, you're probably just
reaching the limits of things, and either need less data, more nodes or
more heap.

On 28 January 2015 at 02:14, horst knete  wrote:

> Hey guys,
>
> i´ve been around this "problem" for quite a while, and didnt got a clear
> answer to it.
>
> Like many of you guys out there, we are running an es-"cluster" on a
> single strong server, moving older indices from the fast SSD´s to slow
> cheap HDD´s (About 25 TB data).
>
> To make this work we got 3 instances of ES running with the path of
> indices set to the HDD´s mountpoint and 1 single instance for the
> indexing/searching SSD´s.
>
> What makes me wonder all the time is, that even the "Storage"-nodes dont
> do anything(there is no indexing happening, there is 95% of the time no
> searching happening, they are just keeping the old indices fresh) the cpu
> load caused by this "idle" nodes is about 50% from the whole cpu working
> time.
>
> hot_threads: https://gist.github.com/german23/662732ca4d9dbdcb406b
>
>
> Is it possible that due to ES-Cluster mechanism, all the indexes are keep
> getting refreshed all the time, when a document is indexed or a search is
> executed.
>
> Are there any configuration options to avoid such an behavior?
>
> Would it be better to export the old indices to an separate ES-Cluster and
> configure multiple ES-Paths in Kibana?
>
> Are there any best practices to maintain such cluster?
>
> I would appreciate any form of feedback.
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f8ce8722-027d-4656-83ce-6c19b97e5e34%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9gfH5kgiHRdZyOWMD3eqtirssbECTJ7HuYK%2BZX8mKQXQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Search within array

2015-01-27 Thread Roger de Cordova Farias
I'm searching on an array of objects

The problem is when I search using query string
,
it matches the text split in different objects (different array positions).
Is there a way to avoid this behavior and search the query string within
the same array position?

I know that I could index the field with a high position_offset_gap and
search using phrase, but I don't need the text to be in order, only within
the same array position

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2531wF4r09FHHcWOLVcU5V4O_p%2BCGEpqGBkto37ic3oe0Pg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Failed to parse ES Query

2015-01-27 Thread Adon Smith
Hi While executing an ES query through my java code.
I am getting this exception 

Failed to parse query [(((\"stew\" OR \"kabobs\" OR \"filet\" OR \"brisket\" OR 
\"roast\" OR \"steak\" OR \"beef\" OR \"burger\") AND 
asdfqwer3456!@#$%dfghtyui%^\u0026(*) AND
-\"vegan\")]]; nested: ParseException[Cannot parse \u0027(((\"stew\" OR 
\"kabobs\" OR \"filet\" OR \"brisket\" OR \"roast\" OR \"steak\" OR \"beef\" OR 
\"burger\") AND asdfqwer3456!@#$%dfghtyui%^\u0026(*) AND
-\"vegan\")\u0027: Lexical error at line 1, column 127. Encountered: \"\" 
(92), after : \"\"]; nested: TokenMgrError[Lexical error at line 1, column 127. 
Encountered: \"\" (92), after : \"\"] 

I am very new to ES , don't know what this exception means ,Can someone please 
explain ?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b60d60c8-afd0-47fd-80d1-cb5e97dfcd3f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How does sorting on _id work?

2015-01-27 Thread Brian
Yes, the _id field is a string. You are not limited to numbers. In fact, an 
automatically generated ID has many non-numeric characters in it.

For what you want, you should create an id field, map it to a long integer, 
and then copy your _id into that id field when you load the document. Then 
when you sort on the id field, you will get a numeric sort.

Hope this helps.

Brian

On Tuesday, January 27, 2015 at 1:28:44 PM UTC-5, Abid Hussain wrote:
>
> ... can it be that _id is treated as string? If so, is there any way 
> retrieve the max _id field with treating _id as integer?
>
> Am Dienstag, 27. Januar 2015 19:24:41 UTC+1 schrieb Abid Hussain:
>>
>> Hi all,
>>
>> I want to determine the doc with max and min _id value. So, when I run 
>> this query:
>> GET /my_index/order/_search
>> {
>> "fields": [ "_id" ],
>> "sort": [
>>{ "_uid": { "order": "desc" } }
>> ],
>> "size": 1
>> }
>> I get a result:
>> {
>>...
>>"hits": {
>>   ...
>>   "hits": [
>>  {
>> "_index": "my_index",
>> "_type": "order",
>> "_id": "99",
>> "_score": null,
>> "sort": [
>>"order#99"
>> ]
>>  }
>>   ]
>>}
>> }
>>
>> There is definitevely a doc with _id value 11132106 in index which I 
>> would have expected as result.
>>
>> And, when I run the same search with *order asc* I get a result with 
>> _id 100 which is higher than 99...?
>>
>> What am I doing wrong?
>>
>> Regards,
>>
>> Abid
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7842cdcf-67ee-48ed-8ef6-e8be2bb63a4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How does sorting on _id work?

2015-01-27 Thread Abid Hussain
... can it be that _id is treated as string? If so, is there any way 
retrieve the max _id field with treating _id as integer?

Am Dienstag, 27. Januar 2015 19:24:41 UTC+1 schrieb Abid Hussain:
>
> Hi all,
>
> I want to determine the doc with max and min _id value. So, when I run 
> this query:
> GET /my_index/order/_search
> {
> "fields": [ "_id" ],
> "sort": [
>{ "_uid": { "order": "desc" } }
> ],
> "size": 1
> }
> I get a result:
> {
>...
>"hits": {
>   ...
>   "hits": [
>  {
> "_index": "my_index",
> "_type": "order",
> "_id": "99",
> "_score": null,
> "sort": [
>"order#99"
> ]
>  }
>   ]
>}
> }
>
> There is definitevely a doc with _id value 11132106 in index which I would 
> have expected as result.
>
> And, when I run the same search with *order asc* I get a result with 
> _id 100 which is higher than 99...?
>
> What am I doing wrong?
>
> Regards,
>
> Abid
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d3e540ef-8e40-4f38-b655-db9da9b64946%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How does sorting on _id work?

2015-01-27 Thread Abid Hussain
Hi all,

I want to determine the doc with max and min _id value. So, when I run this 
query:
GET /my_index/order/_search
{
"fields": [ "_id" ],
"sort": [
   { "_uid": { "order": "desc" } }
],
"size": 1
}
I get a result:
{
   ...
   "hits": {
  ...
  "hits": [
 {
"_index": "my_index",
"_type": "order",
"_id": "99",
"_score": null,
"sort": [
   "order#99"
]
 }
  ]
   }
}

There is definitevely a doc with _id value 11132106 in index which I would 
have expected as result.

And, when I run the same search with *order asc* I get a result with 
_id 100 which is higher than 99...?

What am I doing wrong?

Regards,

Abid

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/92429b89-0741-40cc-9d76-8e1a36e5c1f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana - IIS 7.5

2015-01-27 Thread GWired
That was it,  I guess Windows 8 has it out of the box.

On Tuesday, January 27, 2015 at 3:31:28 AM UTC-5, Akshay Davis wrote:
>
> Have you added the .json MIME type for the site in IIS?
>
>
>
> On Monday, January 26, 2015 at 3:46:44 PM UTC, GWired wrote:
>>
>> Yes,
>>
>> It works when I'm on my localhost serving it to me connected to a remote 
>> elasticsearch.  It just isn't working when I'm serving it from a dedicated 
>> Windows 2008 web server connected to the same remote elasticsearch.
>>
>> Config Working:
>>
>> Desktop: IIS 8 installed on Windows 8.1 - Kibana Installed Localhost
>> Server: abc.mydomain.com:9200, Elasticsearch 1.4.2 
>> Elastic Search yml has 
>>  
>> http.cors.enabled: true
>> http.cors.allow-origin: "/.*/"
>>
>>
>> Config Not Working:
>> Kibana Server: IIS 7.5 installed on Server 2008R2 - Kibana Installed on 
>> port 8080
>> Elasticsearch Server: exactly the same as above
>>
>> When elasticsearch was cors info was incorrect it actually gave an error 
>> message.  Now it is just launching partially, it's not giving any errors.  
>> Which is very strange.
>>
>> No Errors in the eventlog of the Kibana server.
>>
>> Garrett
>>
>>
>>
>>
>>
>>
>> On Monday, January 26, 2015 at 9:49:16 AM UTC-5, Magnus Bäck wrote:
>>
>>> On Monday, January 26, 2015 at 14:58 CET, 
>>>  GWired  wrote: 
>>>
>>> > I was able to get Kibana setup on my localhost and did a generic entry 
>>> > to allow everything into the elasticsearch.yml 
>>> > http.cors.allow-origin: "/.*/" 
>>> > Now I'm trying to getting it to run on my remote server running IIS 
>>> > 7.5 on port 8080. 
>>> > The page loads but only the top bar loads and nothing else any ideas? 
>>>
>>> Did you also enable CORS by setting http.cors.enabled to true? 
>>>
>>>
>>> http://stackoverflow.com/questions/26828099/kibana-returns-connection-failed
>>>  
>>>
>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-http.html
>>>  
>>>
>>> -- 
>>> Magnus Bäck| Software Engineer, Development Tools 
>>> magnu...@sonymobile.com | Sony Mobile Communications 
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2bc3973c-2291-4f3d-a02c-f49824ede2c5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Top hits agg - hits (parent and child) sorted by geolocation

2015-01-27 Thread Cody Stringham
Hey everyone,

We have been implementing ES for our API for a few months, and have finally 
come to a wall with the top hits aggregation.

We are trying to use this to get similar items grouped under a parent. The 
aggregation is working perfectly, except when we want to add a geolocation 
sort. We have done a little bit with the script sorting based on another 
field **{ max: {script: "doc['offer_value'].value"} }** we just cant seem 
to do this based on the geolocation distance.

Any advice would be extremely appreciated, thank you in advance!

Cody

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fa836997-036c-4b07-af19-837dcf3f2916%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Plugin for document field value suggestions

2015-01-27 Thread Jan Kucera
Hello everyone,

I have played with Elasticsearch for a while and ran into amazing and 
useful plugins in the development (Sense, HQ, Kopf, etc), however I haven't 
found any plugin which would anyhow help me creating new document by 
providing term suggestions from previously submitted docs. Has anyone know 
about such plugin?

Thanks in advance for your answers,
- Jan


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed2567d4-b4db-40a4-b726-39097230231e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


What causes high CPU load on ES-Storage Nodes?

2015-01-27 Thread horst knete
Hey guys,

i´ve been around this "problem" for quite a while, and didnt got a clear 
answer to it.

Like many of you guys out there, we are running an es-"cluster" on a single 
strong server, moving older indices from the fast SSD´s to slow cheap HDD´s 
(About 25 TB data).

To make this work we got 3 instances of ES running with the path of indices 
set to the HDD´s mountpoint and 1 single instance for the 
indexing/searching SSD´s.

What makes me wonder all the time is, that even the "Storage"-nodes dont do 
anything(there is no indexing happening, there is 95% of the time no 
searching happening, they are just keeping the old indices fresh) the cpu 
load caused by this "idle" nodes is about 50% from the whole cpu working 
time.

hot_threads: https://gist.github.com/german23/662732ca4d9dbdcb406b


Is it possible that due to ES-Cluster mechanism, all the indexes are keep 
getting refreshed all the time, when a document is indexed or a search is 
executed.

Are there any configuration options to avoid such an behavior?

Would it be better to export the old indices to an separate ES-Cluster and 
configure multiple ES-Paths in Kibana?

Are there any best practices to maintain such cluster?

I would appreciate any form of feedback.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f8ce8722-027d-4656-83ce-6c19b97e5e34%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


manually trigger gc in elasticsearch node

2015-01-27 Thread Jason Wee
Hello,

Is there a way to manually trigger garbage collector in elasticsearch node? 
Read that jmx connection has been removed from es since version 0.90.

Jason

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e9e50dc6-d53e-42cf-b3b9-5d6742128016%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Regarding elasticsearch load balancer

2015-01-27 Thread phani . nadiminti
Hi Jorg 

  Thank you for the quick reply. let say i am establishing small cluster i 
don't have client node on it in this case can i use haproxy to forward 
requests between servers.

Thanks
phani

On Tuesday, January 27, 2015 at 5:33:34 PM UTC+5:30, phani.n...@goktree.com 
wrote:
>
> Hi All,
>
> i know a concept of load balancer in elastic search which is HTTP 
> enabled and never be a master and doesn't hold any data. my doubt is if we 
> introduce client node called as loadbalancer in cluster is there any need 
> to setup haproxy for the cluster to forward request.if there is no need of 
> haproxy please let me know how to achieve fail over using load balancer.
>  
>suppose in my cluster i don't have load balancer nodes how can we 
> manage request forward settings . let say node 1 is down but we configured 
> node1 in client application.users will hit node1 from client application 
> but node1 is down in this case how can we forward request to client with 
> out haproxy?
>
>  Thanks
> phani.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/004fc17e-e151-4cea-adad-1018072f0f9f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Querystring filter dosn't work

2015-01-27 Thread Messias
Now I have found the right query:

You have to double escape the reserved characters.

e.g. "uri:\\/video\\-ondemand\\/video\\/flv\\/test\\/*" with this query all 
works as expected,

Best regards
Messias

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f1f967e-2e0c-4094-a9ad-72cbd50de7fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Regarding elasticsearch load balancer

2015-01-27 Thread joergpra...@gmail.com
It depends on the language/platform you use.

If Java, all is very easy - NodeClient connects to more than one node, and
also TransportClient "sniff" mode. So Java clients are using fault tolerant
connection mode.

Same do the official clients, but only if configured properly. Please study
how to connect official ES clients to more than one node of a cluster.

"Load balancing" can be achieved by adding more connections to the client.
Since each node knows the cluster state and can find out how to route
requests to the involved nodes, this is not always required. This is an
internal ES mechanism.

No haproxy needed. Even if haproxy is used, haproxied requests are
balanced/routed once more in the ES internals. So you will end up in
"double load balancing".

Jörg


On Tue, Jan 27, 2015 at 1:03 PM,  wrote:

> Hi All,
>
> i know a concept of load balancer in elastic search which is HTTP
> enabled and never be a master and doesn't hold any data. my doubt is if we
> introduce client node called as loadbalancer in cluster is there any need
> to setup haproxy for the cluster to forward request.if there is no need of
> haproxy please let me know how to achieve fail over using load balancer.
>
>suppose in my cluster i don't have load balancer nodes how can we
> manage request forward settings . let say node 1 is down but we configured
> node1 in client application.users will hit node1 from client application
> but node1 is down in this case how can we forward request to client with
> out haproxy?
>
>  Thanks
> phani.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d0043350-001e-46a2-9800-ee9b0328aae5%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHLeOhDfE1p2585Eiib6W3doTP12aWKwBHioAhTMZOPAA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Regarding elasticsearch load balancer

2015-01-27 Thread phani . nadiminti
Hi All,
   
i know a concept of load balancer in elastic search which is HTTP 
enabled and never be a master and doesn't hold any data. my doubt is if we 
introduce client node called as loadbalancer in cluster is there any need 
to setup haproxy for the cluster to forward request.if there is no need of 
haproxy please let me know how to achieve fail over using load balancer.
 
   suppose in my cluster i don't have load balancer nodes how can we manage 
request forward settings . let say node 1 is down but we configured node1 
in client application.users will hit node1 from client application but 
node1 is down in this case how can we forward request to client with out 
haproxy?

 Thanks
phani.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d0043350-001e-46a2-9800-ee9b0328aae5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Match phrase and minimum_should_match combination

2015-01-27 Thread Zdeněk Šebl
Hi,
I try to use *ngram* based solution as "shotgun approach" to get results 
which are not covered by more precise analyzers.

Article describing this approach is for example here 


*Match* query, it is working as expected including parameter 
minimum_should_match (this parameter is very important to be able to 
exclude matches of only one ngram from queried word).

*1.* Below is explanation of *match* query searching word *"first"* without

filtered(Text:fir Text:irs Text:rst)->cache(_type:item)

and with "minimum_should_match": "80%"

filtered((Text:fir Text:irs Text:rst)*~2*)->cache(_type:item)

2. But if I try to use the same with *match_phrase* query and phrase *"first 
second"* I'll get the same result with or without minimum_should_match 
parameter

filtered(Text:"(fir irs rst) (sec eco con ond)")->cache(_type:item)

I 'am expecting (for minimum_should_match) something like

filtered(Text:"(fir irs rst)*~2* (sec eco con ond)*~3*")->cache(_type:item)

In attached file, there is complete Marvel Sense code with example 
described above.

*Does anybody know if it is a bug or if it known limitation of used 
technology?*
Or maybe there is another way how to achieve better phrase query results in 
combination with ngram analyzer.

Thanks,
Zdenek

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0c637de8-0c27-4097-a94c-88247fa012f2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
PUT /tokenizers
{
  "settings": {
"number_of_shards": 1, 
"number_of_replicas": 0,
"analysis": {
  "filter": {
"trigram": {
  "type": "ngram",
  "min_gram": 3,
  "max_gram": 3
}
  },
  "analyzer": {
"trigram": {
  "tokenizer": "standard",
  "filter": [
"trigram"
  ]
}
}
}
  },
  "mappings": {
"item": {
  "dynamic": "false",
  "properties": {
"Text": {
  "type": "string",
  "analyzer": "trigram"
}
  }
}
  }
}

GET /tokenizers/item/_validate/query?explain
{
  "query": {
"match": {
  "Text": {
"query": "first"
  }
}
  }
}

GET /tokenizers/item/_validate/query?explain
{
  "query": {
"match": {
  "Text": {
"query": "first",
"minimum_should_match": "80%"
  }
}
  }
}

GET /tokenizers/item/_validate/query?explain
{
  "query": {
"match_phrase": {
  "Text": {
"query": "first second"
  }
}
  }
}

GET /tokenizers/item/_validate/query?explain
{
  "query": {
"match_phrase": {
  "Text": {
"query": "first second",
"minimum_should_match": "80%"
  }
}
  }
}



Re: Optimizing filter bitsets

2015-01-27 Thread Adrien Grand
On Mon, Jan 26, 2015 at 11:05 PM, Mike Sukmanowsky <
mike.sukmanow...@gmail.com> wrote:

> I understand that the result of the bool is the bitset that's cached as
> opposed to the individual term filters themselves. This had me concerned
> that for certain complex bool filters (where we have >10 or so term filters
> inside a "must" clause), were creating bitsets that have far too narrow an
> application (basically the one query they were used for).
>

Actually with today'd defaults, you would create and cache one bit set for
each clause of the bool filter, and then the bool filter would just merge
bit sets. The resulting bit set from a bool filter is not cached by
default. FYI we have plans to change this in 2.0
https://github.com/elasticsearch/elasticsearch/pull/8573 by keeping
statistics about filter usage and only caching those that are both costly
and reused. So even a compound bool filter could be cached if it keeps on
being reused with the same clauses.


> If we have certain terms (say customer ID, ) which update fairly
> infrequently (only with new docs) and others that update fairly frequently
> (say time-based fields), is there a way to optimize our bool queries to
> create reusable bitsets for the infrequent term filters while also having
> the benefit of caching the result of the entire bool filter?
>

Since the filter cache works per segment, making different choices based on
how-frequently some _fields_ are being updated would not help.


> Is it as simple as adding _cache: true to the terms filters that are
> fairly static?
>

This might be a good idea. And since caching filters has a cost, it might
be a good idea to set _cache:false on filters that you know are unlikely to
be reused. When filters are cached, elasticsearch unfortunately needs to
evaluate all docs from the index against this filter, which can be slow. So
not caching filters which are not reused can make things faster.

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j79MZsC%3D%3DDtMAMKHgbroSdpykvnw7L85oi%3D8dNpaJw5Hg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Confusing results from fuzzy query (1 term, 1 field)

2015-01-27 Thread Michael McCandless
Looks like this was answered on StackOverflow?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jan 26, 2015 at 7:54 PM, Steve Pearlman 
wrote:

> For a well formatted example, please see:
> http://stackoverflow.com/questions/28161480/fuzzy-not-functioning-as-expected-one-term-search-see-example
>
> Here's my problem:
> Consider the following results from:
>
> curl -XGET 'http://localhost:9200/megacorp/employee/_search' -d
> '{ "query" :
>  {"match":
> {"last_name": "Smith"}
>  }
>   }'
>
> Result:
>
> {
>   "took": 3,
>   "timed_out": false,
>   "_shards": {
> "total": 5,
> "successful": 5,
> "failed": 0
>   },
>   "hits": {
> "total": 2,
> "max_score": 0.30685282,
> "hits": [
>   {
> "_index": "megacorp",
> "_type": "employee",
> "_id": "1",
> "_score": 0.30685282,
> "_source": {
>   "first_name": "John",
>   "last_name": "Smith",
>   "age": 25,
>   "about": "I love to go rock climbing on the weekends.",
>   "interests": [
> "sports",
> "music"
>   ]
> }
>   },
>   {
> "_index": "megacorp",
> "_type": "employee",
> "_id": "2",
> "_score": 0.30685282,
> "_source": {
>   "first_name": "Jane",
>   "last_name": "Smith",
>   "age": 25,
>   "about": "I love to go rock climbing",
>   "interests": [
> "sports",
> "music"
>   ]
> }
>   }
> ]
>   }
> }
>
> Now when I execute the following query:
>
> curl -XGET 'http://localhost:9200/megacorp/employee/_search' -d
> '{ "query" :
> {"fuzzy":
>  {"last_name":
>   {"value":"Smitt",
>"fuzziness": 1
>   }
>   }
>  }
>  }'
>
> Returns NO results despite the Levenshtein distance of "Smith" and "Smitt"
> being 1.  The same thing happens with a value of "Smit," (NB: the comma is
> for grammatical purposes, it's not in the value) which also has a distance
> of 1.  If I put in a `fuzziness` value of 2, I get results.  What am I
> missing here?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0fe765b1-1bda-4459-8ee5-aeddc9f05727%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe5cXGF_FesNGOxS4294c2yq8oagmqZ%2BxKYjKv1_wO5kw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana 4 - Issues with Filters Aggregations

2015-01-27 Thread renaud
Hi,

We were wondering is anyone had time looking at those issues ? Are they 
already known or should we open a github issue regarding those ?

Thanks
-- 
Renaud Delbru

On Monday, January 26, 2015 at 1:03:19 PM UTC, ren...@sindicetech.com wrote:
>
>
> 
> Hi,
>
> We tried to create panels based on Filters Aggregation on the latest 
> Kibana source, and we encountered the following problems:
>
> 1) Buckets from Filters aggregation are not displayed in Data table
>
> By exporting the raw response, we can see that the buckets are created, 
> however, nothing is displayed in the table. We then tried to use another 
> panel (pie chart or bar chart) with the same filters aggregation and the 
> buckets were displayed correctly.
>
> 2) Selecting a bucket does not affect search result
>
> On a pie or bar chart, configured with Filters Aggregation, if we click on 
> a bucket, nothing is happening - the search result is not restricted based 
> on the selected bucket
>
> 3) After selecting a bucket, dashboard ui does not display applied filters
>
> After clicking on a bucket from a chart configured with Filters 
> Aggregation, the dashboard UI does not display anymore the current applied 
> filters.
>
> See attached screen recording showing the last two problems.
>
> -- 
> Renaud Delbru
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6baa7829-c22d-40d9-b450-273a7060bf4d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Regarding node architecture initial setup

2015-01-27 Thread Mark Walkom
Not much point having master and data only nodes for such a small cluster.
Just make them all master and data and then set min masters to 3.

On 27 January 2015 at 21:07,  wrote:

> Hi Radu,
>
>   Thanks for the suggestion and based on the criteria i designed one
> architecture using four nodes please suggest me the best way to arrange i
> satisfied all conditions in my architecture.
>
>
>   node1 : Dedicated master
>   node2 : mater and data
>   node3:  master and data
>   node 4: dedicated data node
>
>   Satisfied n/2+1 minimum number of master nodes would be 3. and
> for fail over of data nodes i arranged node2,node3,node4. because i can
> keep 2 replicas to increase search performance that is the reason i
> allotted node2,3,4 as data nodes.
>
>  on which node I have to set the minimum no of master nodes
> settings?
>
> please suggest me is this way is correct to arrange nodes?
>
> Thanks
> phani
>
> On Thursday, January 8, 2015 at 3:23:35 PM UTC+5:30,
> phani.n...@goktree.com wrote:
>>
>> Hi All
>>
>>  I have chosen to establish 4 nodes in my cluster. I read concept of
>> dedicated master nodes and only data holding nodes in elastic search.please
>> explain me briefly how can i establish cluster by using the above four
>> nodes.
>>
>>suppose if i have chosen N/2+1 for 4 nodes the minimum no of master
>> nodes would be 3 so one node left in my cluster. master nodes only managing
>> and indexing data to other nodes i.e data nodes. to implement replica do we
>> need 5 nodes because i left with only 1 data node where i can keep replica?
>>
>>  other wise will the primary shard resides on any one of master node
>> and replica will be hold my data node or please explain me how to design
>> above scenarios with my four nodes in cluster.
>>
>> Thanks
>>
>> phani
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/21e06e81-82fe-4a6e-b29c-21977bf14315%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-bLtYXY%3DkSRk96WhVTqfpYfWi8cEUPo6gw2eWzMRyjJw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


ElasticSearch Autocomplete Feature

2015-01-27 Thread CY Kuek
Hi All,

I am new to ElasticSearch and currently I am using ElasticSearch to connect 
to MongoDB for indexing and searching. I would like to implement keyword 
auto-complete feature like search engines which provide a list of 
suggestion keyword when user key in partial keywords. I had a document 
which contains many fields and I only want the auto-complete keyword to 
search on 4 fields (title, description, category and sub-category) . Below 
is the index that I am creating:

*curl -XPUT http://localhost:9200/testindex -d '*
*{*
*  "settings": {*
*  "analysis":{*
*  "filter":{*
*  "edge_nGram_filter":{*
*  "type":"edgeNGram",*
*  "min_gram": 1,*
*  "max_gram": 50,*
*  "token_chars": [*
*  "letter",*
*  "digit",*
*  "punctuation",*
*  "symbol"*
*  ]*
*  }*
*  },*
*  "analyzer": {*
*  "edge_nGram_analyzer": {*
*   "type": "custom",*
*   "tokenizer": "whitespace",*
*   "filter": [*
*   "lowercase",*
*   "asciifolding",*
*   "edge_nGram_filter"*
*   ]*
*  },*
*  "whitespace_analyzer": {*
*  "type": "custom",*
*  "tokenizer": "whitespace",*
*  "filter": [*
*  "lowercase",*
*  "asciifolding"*
*  ]*
*  }*
* }*
*  }*
*  },*
*  "mappings" : {*
*  "Merchant" : {*
*  "_all": {*
*"index_analyzer": "edge_nGram_analyzer",*
*"search_analyzer": "whitespace_analyzer"*
*  },*
*  "properties" : {*
*  "screenNm": {*
*  "type": "string",*
*  "index": "no",*
*  "include_in_all": false*
*  },*
*  "categoryId": {*
*  "type": "long",  *
*  "index": "no",*
*  "include_in_all": false*
*  },*
*  "title": {*
*  "type": "string",*
*  "index": "not_analyzed"*
*  },*
*  "desc": {*
*  "type": "string",*
*  "index": "not_analyzed"*
*  },*
* "email": {  "type": "string",
  "index": "no",  "include_in_all": false  },  
"createdAt": {  "format": "dateOptionalTime",  
"type": "date",  "index": "no",
  "include_in_all": false  },  
"subCategoryList": {*
*  "properties" : {*
*  "subCategoryId": {*
*  "type": "long",  *
*  "index": "no",*
*  "include_in_all": false*
*  },*
*  "subCategory": {*
*  "type": "string",*
*  "index": "not_analyzed"*
*  }*
*  }*
*  },*
*  "category": {*
*  "type": "string",*
*  "index": "not_analyzed"*
*  }*
*  }*
*  }*
*   }*
*}'*

Below is my questions and I am hoping some of you could shed me some lights:

1) When I perform match query search on "*_al*l" fields for keyword "*food*" 
(I had set the returned record size to 6), I will get 6 documents which 
match the *food *keyword in either of the fields (title, description, 
category and sub-category). Result example as the following:

 • title=*FoodPandaToyShop*, description=Desc1, category=toy, 
sub-category=kids
 • title=Restaurant B, description=Desc2, category=*foodcategoryA*, 
sub-category=dairy
 • title=Restaurant B, description=Desc3, category= *foodcategoryB*, 
sub-category=dairy
 • title=Restaurant B, description=Desc4, category= *foodcategoryC*, 
sub-category=dairy
 • title=Restaurant B, description=Desc5, category= *foodcategoryD*, 
sub-category=dairy
 
So the list of string that we want are *FoodPandaToyShop, 
foodcategoryA, foodcategoryB, foodcategoryC, foodcategoryD*. We need to 
find a way to massage the data because we will received document back from 
ElasticSearch instead of text string. Is there a way to get the text 
instead of the whole document??


2) Can we perform my scenario using completion suggester?? As I need to 
perform search on multiple fields.

3) Is performance a issue if I am using edgeNGram on large documents set?

Appreciate your help. Thanks a lot !!!


Regards,
CYea

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop 

Re: Regarding node architecture initial setup

2015-01-27 Thread phani . nadiminti
Hi Radu,

  Thanks for the suggestion and based on the criteria i designed one 
architecture using four nodes please suggest me the best way to arrange i 
satisfied all conditions in my architecture.


  node1 : Dedicated master
  node2 : mater and data
  node3:  master and data
  node 4: dedicated data node

  Satisfied n/2+1 minimum number of master nodes would be 3. and 
for fail over of data nodes i arranged node2,node3,node4. because i can 
keep 2 replicas to increase search performance that is the reason i 
allotted node2,3,4 as data nodes.

 on which node I have to set the minimum no of master nodes 
settings?

please suggest me is this way is correct to arrange nodes?

Thanks
phani

On Thursday, January 8, 2015 at 3:23:35 PM UTC+5:30, phani.n...@goktree.com 
wrote:
>
> Hi All
>
>  I have chosen to establish 4 nodes in my cluster. I read concept of 
> dedicated master nodes and only data holding nodes in elastic search.please 
> explain me briefly how can i establish cluster by using the above four 
> nodes. 
>
>suppose if i have chosen N/2+1 for 4 nodes the minimum no of master 
> nodes would be 3 so one node left in my cluster. master nodes only managing 
> and indexing data to other nodes i.e data nodes. to implement replica do we 
> need 5 nodes because i left with only 1 data node where i can keep replica?
>
>  other wise will the primary shard resides on any one of master node 
> and replica will be hold my data node or please explain me how to design 
> above scenarios with my four nodes in cluster.
>
> Thanks 
>
> phani
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/21e06e81-82fe-4a6e-b29c-21977bf14315%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


not able to refine from o/p of query in logstash

2015-01-27 Thread raj@
I am using the below query to pull the information from logstash:: 

curl -XGET ' http://logs:xx00/_all/_search?pretty=true' -d ' { 
"query": { 
"bool": { 
"must": [ 
{ 
"match": { 
"_type": "pre" 
} 
}, 
{ 
"match": { 
"message": "MapDone" 
} 
}, 
{ 
   "range": { 
"@timestamp": { 
"gte": "now-5m" 
} 

} 
} 
] 
} 
} }' 
Output :: 

{ "took" : 177, "timed_out" : false, "_shards" : { "total" : 3225,
"successful" : 3225, "failed" : 0 }, "hits" : { "total" : 1238, "max_score"
: 4.3801584, "hits" : [ { "_index" : "fi-logstash-2015.01.21", "_type" :
"fi", "_id" : "CORYzNPHnnQeu09A", "_score" : 4.3801584,
"_source":{"thread_name":"main","message":"[MapDone]\tstandards.po.poRsxWrite
in
169ms","@timestamp":"2015-01-21T14:48:59.835+00:00","level":"INFO","mdc":{},"file":"fi-1-small-log.json","class":"fi.log.MapLogHandler","line_number":"21","logger_name":"fi.Mapper","method":"info","@version":1,"source_host":"fi.pp","host":"prefi2","offset":"185244882","type":"prefi","tags":["instance"],"syslog_severity_code":5,"syslog_facility_code":1,"syslog_facility":"user-level","syslog_severity":"notice"}
} 

The above is only a part of the output.I am trying to get only the map name
as output. When I am trying , I am getting errors. 

Different sample Maps:: formats.pure.qm.fromSIP.toCSV.write in 24ms
H044Grain.hub.asn.from.advanceShipNoticeWrite in 188ms
H9B1honey.hub.po.fromFEDSto.purchaseOrder in 416ms
HAEPrugs.hub.rsx.v7.r0.po.poFedsWrite in 231ms
H4Grain2.hub.in.fromtoAPP.invoiceWrite in 110ms
H2Home.v700.e4060.co.in.inFedsWrite in 108ms 

I am tring to get:: 

1 - only mapping names ( H4Grain2.hub.in.from.invoiceWrite ) 
2 - unique mappings ( something like | uniq to previous o/p ) 
3 - Average of last 1 minutes mappings 

Can anybody help check if this is possible. Thanks a ton in advance.
 



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/not-able-to-refine-from-o-p-of-query-in-logstash-tp4069573.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1422265133181-4069573.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: not able to refine from o/p of query in logstash

2015-01-27 Thread raj@

can anyone help with this.just bumping this email. sorry if I am breaking
any 



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/not-able-to-refine-from-o-p-of-query-in-logstash-tp4069573p4069621.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1422333875969-4069621.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana - IIS 7.5

2015-01-27 Thread Akshay Davis
Have you added the .json MIME type for the site in IIS?



On Monday, January 26, 2015 at 3:46:44 PM UTC, GWired wrote:
>
> Yes,
>
> It works when I'm on my localhost serving it to me connected to a remote 
> elasticsearch.  It just isn't working when I'm serving it from a dedicated 
> Windows 2008 web server connected to the same remote elasticsearch.
>
> Config Working:
>
> Desktop: IIS 8 installed on Windows 8.1 - Kibana Installed Localhost
> Server: abc.mydomain.com:9200, Elasticsearch 1.4.2 
> Elastic Search yml has 
>  
> http.cors.enabled: true
> http.cors.allow-origin: "/.*/"
>
>
> Config Not Working:
> Kibana Server: IIS 7.5 installed on Server 2008R2 - Kibana Installed on 
> port 8080
> Elasticsearch Server: exactly the same as above
>
> When elasticsearch was cors info was incorrect it actually gave an error 
> message.  Now it is just launching partially, it's not giving any errors.  
> Which is very strange.
>
> No Errors in the eventlog of the Kibana server.
>
> Garrett
>
>
>
>
>
>
> On Monday, January 26, 2015 at 9:49:16 AM UTC-5, Magnus Bäck wrote:
>
>> On Monday, January 26, 2015 at 14:58 CET, 
>>  GWired  wrote: 
>>
>> > I was able to get Kibana setup on my localhost and did a generic entry 
>> > to allow everything into the elasticsearch.yml 
>> > http.cors.allow-origin: "/.*/" 
>> > Now I'm trying to getting it to run on my remote server running IIS 
>> > 7.5 on port 8080. 
>> > The page loads but only the top bar loads and nothing else any ideas? 
>>
>> Did you also enable CORS by setting http.cors.enabled to true? 
>>
>>
>> http://stackoverflow.com/questions/26828099/kibana-returns-connection-failed 
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-http.html
>>  
>>
>> -- 
>> Magnus Bäck| Software Engineer, Development Tools 
>> magnu...@sonymobile.com | Sony Mobile Communications 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d0b66fb2-34c6-43e7-9dc7-e5782400bed1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Calculation of document frequency for cutoff_frequency

2015-01-27 Thread Dany Gielow
Hello,

I want to be able to see exactly which terms are considered high frequency 
terms at a specific cutoff_frequency.
I noticed that if I query the termvectors of different documents with 
different routing values, the values of field_statistics[doc_count] and 
the term[doc_freq] change.

That led me to the following questions:
How is the document frequency calculated that is used for the 
cutoff_frequency feature?
Is the document frequency of terms calculated for the shard of its 
document, all shards of the index or all queried shards (routing) or which 
shards?
Is the document frequency used in cutoff_frequency calculated like 
term[doc_freq] / field_statistics[doc_count], or differently?

How can I find the terms, that are high frequency terms at a given 
cutoff_frequency?
Is there a facet or aggregation query, that i could use?

Many thanks
Dany



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ecaba706-eb9a-4023-b887-200aa1d4a98f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.